NOVEL MAN MACHINE INTERFACES AND APPLICATIONS 



CROSS REFERENCE TO RELATED APPLICATIONS 

5 This application is a divisional application of Serial No. 09/138,285, filed August 21 , 
1998 and now USP *; which application claims benefit of (and hereby incorporates by 
reference) provisional application Serial No. 60/056,639, filed August 22,1997, and 
provisional application Serial No. 60/059,561 , filed September 19 1997. 

10 BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 

The invention relates to simple input devices for computers, well suited for use with 3-D 
1 5 graphically intensive activities, and operating by optically sensing object or human 

positions and /or orientations. The invention in many preferred embodiments, uses real 
time stereo photogrammetry using single or multiple TV cameras whose output is 
analyzed and used as input to a personal computer. 

20 DESCRIPTION OF RELATED ART 

The closest known references to the stereo photogrammetric imaging of datum's 
employed by several preferred embodiments of the invention are thought to exist in the 
fields of flight simulation, robotics, animation and biomechanical studies. Some early 
25 prior art references in these fields are 

- Pugh USP 4,631 ,676; 

- Birk USP 4,416,924; 

- Pinckney USP 4,219,847; 

30 - USP 4,672,564 by Eglietal, filed Nov 15, 1984; 

- Pryor USP 5,506,682 Robot Vision Using Targets; 
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- Pryor, Method for Automatically Handling, Assembling & Working on Objects, 
USP 4,654,949; and 

- Pryor, USP 5,148,591 , Vision Target Based Assembly. 

5 In what is called "virtual reality", a number of other devices have appeared for human 
instruction to a computer. Examples are head trackers, magnetic pickups on the human 
and the like, which have their counterpart in the invention herein. 

References from this field having similar goals to some aspects of the invention herein 

10 are: 

- US 5,297,061 by Dementhon et al 

- US 5,388,059 also by Dementhon, et al 

- US 5168531 : Real-time recognition of pointing information from video, by Sigel 
15 - US 5,617,312 Computer system that enters control information by means of 

video camera by lura et al, filed Nov 18, 1994 

- US 5616078 : Motion-controlled video entertainment system, by Oh; Ketsu, 

- US 5594469 : Hand gesture machine control system, by Feeman, et al. 

- US 5454043 : Dynamic and static hand gesture recognition through low-level 
20 image analysis by Freeman; 

- US 5581276 : 3D human interface apparatus using motion recognition based on 
dynamic image processing, by Cipolla et al. 

- US 4843568 : Real time perception of and response to the actions of an 
unencumbered participant/user by Krueger, et al 

25 

lura and Sigel disclose means for using a video camera to look at a operators body or 
finger and input control information to a computer. Their disclosure is generally limited 
to two dimensional inputs in an xy plane, such as would be traveled by a mouse used 
conventionally. 

30 
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Dementhon discloses the use objects equipped with 4 LEDs detected with a single 
video camera to provide a 6 degree of freedom solution of object position and 
orientation. He downplays the use of retroref lector targets for this task. 



5 Cipolla et al discusses processing and recognition of movement sequence gesture 
inputs detected with a single video camera whereby objects or parts of humans 
equipped with four reflective targets or leds are moved thru space, and a sequence of 
images of the objects taken and processed. The targets can be colored to aid 
discrimination. 

10 

Pryor, one of the inventors, in several previous applications has described single and 
dual (stereo) camera systems utilizing natural features of objects or special targets 
including retroreflectors for determination of position and orientation of objects in real 
time suitable for computer input, in up to 6 degrees of freedom. 

15 

Pinckney has described a single camera method for using and detecting 4 reflective 
targets to determine position and orientation of an object in 6 degrees of freedom. A 
paper by Dr. H. F. L. Pinckney entitled Theory and Development of an on line 30Hz 
video photogrammetry system for real-time 3 dimensional control presented at the 

20 Symposium of Commission V Photogrammetry for Industry, Stockholm, Aug 1 978, 
together with many of the references referred to therein gives many of the underlying 
equations of solution of photogrammetry particularly with a single camera. Another 
reference relating to use of two or more cameras, is Development of Stereo Vision for 
Industrial Inspection, Dr. S.F. El-Hakim, Proceedings of the Instrument Society of 

25 America (ISA) Symposium, Calgary Alta, April 3-5 1989. This paper too has several 
useful references to the photogrammetry art. 

Generally speaking, while several prior art references have provided pieces of the 
puzzle, none has disclosed a workable system capable of widespread use, the variety 
30 and scope of embodiments herein, nor the breath and novelty of applications made 
possible with electro-optical determination of object position and/or orientation. 
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In this invention, many embodiments may operate with natural features, colored targets, 
self-illuminated targets such as LEDS, or with retroreflective targets. Generally the latter 
two give the best results from the point of view of speed and reliability of detection- of 
5 major importance to widespread dissemination of the technology. 

However, of these two, only the retroreflector is both low cost, and totally unobtrusive to 
the user. Despite certain problems using same, it is the preferred type of target for 
general use, at least for detection in more than 3 degrees of freedom. Even in only two 

10 degrees, where standard "blob" type image processing might reasonably be used to find 
ones finger for example, (e.g., USP 5168531 by Sigel ), use of simple glass bead 
based, or molded plastic corner cube based retroref lectors allows much higher 
frequency response (e.g. 30Hz, 60Hz, or even higher detection rates) from the multiple 
incidence angles needed in normal environments, also with lower cost computers under 

15 a wider variety of conditions- and is more reliable as well. (at least with today's PC 
processing power). 

SUMMARY OF THE INVENTION 

20 Numerous 3D input apparatus exist today. As direct computer input for screen 
manipulation, the most common is the "Mouse" that is manipulated in x and y, and 
through various artifices in the computer program driving the display, provides some 
control in z-axis. In 3 dimensions (3-D) however, this is indirect, time consuming, 
artificial, and requires considerable training to do well. Similar comments relate to 

25 joysticks, which in their original function were designed for input of two angles. 

In the computer game world as well; the mouse, joy stick and other 2D devices prevail 
today. 

30 The disclosed invention is optically based, and generally uses unobtrusive specialized 
datum's on, or incorporated within, an object whose 3D position and/or orientation is 
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desired to be inputted to a computer. Typically such datums are viewed with a single TV 
camera, or two TV cameras forming a stereo pair. A preferred location for the camera(s) 
is proximate the computer display, looking outward therefrom, or to the top or side of the 
human work or play space. 

5 

While many aspects of the invention can be used without specialized datum's (e.g. a 
retro-reflective tape on ones finger, versus use of the natural finger image itself), these 
specialized datum's have been found to work more reliably, and at lowest cost using 
technology which can be capable of wide dissemination in the next few years. This is 
10 very important commercially. Even where only two-dimensional position is desired, such 
as x, y location of a finger tip, this is still the case. 

For degrees of freedom beyond 3, we feel such specialized datum based technology is 
the only practical method today. Retroreflective glass bead tape, or beading, such as 

15 composed of Scotchlite 7615 by 3M co., provides a point, line, or other desirably 

shaped datum which can be easily attached to any object desired, and which has high 
brightness and contrast to surroundings such as parts of a human, clothes, a room etc, 
when illuminated with incident light along the optical axis of the viewing optics such as 
that of a TV camera. This in turn allows cameras to be used in normal environments, 

20 and having fast integration times capable of capturing common motions desired, and 
allows datums to be distinguished easily which greatly reduces computer processing 
time and cost. 

Retroreflective or other datums are often distinguished by color or shape as well as 
25 brightness. Other target datums suitable can be distinguished just on color or shape or 
pattern, but do not have the brightness advantage offered by the retro. Suitable 
Retroreflectors can alternatively be glass, plastic or retroreflective glass bead paints, 
and can be other forms of retroreflectors than beads, such as corner cubes. But the 
beaded type is most useful. Shapes of datums found to be useful have been for 
30 example dots, rings, lines, edge outlines, triangles, and combinations of the foregoing. 
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It is a goal of this invention to provide a means for data entry that has the following key 
attributes among others: 

• Full 3D (up to 6 degrees of freedom, e.g. x, y, z, roll, pitch, yaw) real time dynamic 
5 input using artifacts, aliases, portions of the human body, or combinations thereof; 

• Very low cost, due also to ability to share cost with other computer input functions 
such as document reading, picture telephony, etc.; 

• Generic versatility - can be used for many purposes, and saves as well on learning 
new and different systems for those purposes; 

10 • Unobtrusive to the user; 

• Fast response, suitable for high speed gaming as well as desk use; 

• Compatible as input to large screen displays - including wall projections; 

• Unique ability to create physically real "Alias" or "surrogate" objects; 

• Unique ability to provide realistic tactile feel of objects in hand or against other 
15 objects, without adding cost; 

• A unique ability to enable "Physical" and "Natural" experience. It makes using 
computers fun, and allows the very young to participate. And it radically improves 
the ability to use 3D graphics and CAD systems with little or no training; 

• An ability to aid the old and handicapped in new and useful ways; 

20 • An ability to provide meaningful teaching and other experiences capable of reaching 
wide audiences at low cost; and 

• An ability to give life to a child's imagination thru the medium of known objects and 
software, with out requiring high cost toys, and providing unique learning 
experiences. 

25 

What is also unique about the invention here disclosed is that it unites all of the worlds 
above, and more besides, providing the ability to have a common system that serves all 
purposes well-at lowest possible cost and complexity. 

30 The invention has a unique ability to combine what amounts to 3D icons (physical 
artifacts) with static or dynamic gestures or movement sequences. This opens up, 
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among other things, a whole new way for people, particularly children, beginners and 
those with poor motor or other skills to interact with the computer. By manipulating a set 
of simple tools and objects that have targets appropriately attached, a novice computer 
user can control complex 2D and 3D computer programs with the expertise of a child 
5 playing with toys! 

The invention also acts as an important teaching aide, especially for small children and 
the disabled, who have undeveloped motor skills. Such persons can, with the invention, 
become computer literate far faster than those using conventional input devices such as 
10 a mouse. The ability of the invention to use any desired portion of a human body, or an 
object in his command provides a massive capability for control, which can be changed 
at will. In addition, the invention allows one to avoid carpal tunnel syndrome and other 
effects of using keyboards and mice. One only needs move through the air so to speak, 
or with economically advantageous artifacts. 

15 

The system can be calibrated for each individual to magnify even the smallest motion to 
compensate for handicaps or enhance user comfort or other benefits.(e.g. trying to work 
in a cramped space on an airplane). If desired, unwanted motions can be filtered or 
removed using the invention, (in this case a higher number of camera images than 
20 would normally be necessary is typically taken, and effects in some frames averaged, 
filtered or removed altogether). 

The invention also provides for high resolution of object position and orientation at high 
speed and at very low or nearly insignificant cost. And it provides for smooth input 
25 functions without the jerkiness of mechanical devices such as a sticking mouse of the 
conventional variety. 

In addition, the invention can be used to aid learning in very young children and infants 
by relating gestures of hands and other bodily portions or objects (such as rattles or 
30 toys held by the child), to music and /or visual experiences via computer generated 
graphics or real imagery called from a memory such as DVD disks or the like. 
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The invention is particularly valuable for expanding the value of life-size, near life size, 
or at least large screen (e.g.. greater than 42 inches diagonal) TV displays. Since the 
projection can now be of this size at affordable cost, the invention allows an also 
5 affordable means of relating in a lifelike way to the objects on the screen - to play with 
them, to modify them, and other wise interrelate using ones natural actions and the 
naturally appearing screen size - which can also be in 3D using stereo display 
techniques of whatever desired type. 

10 DESCRIPTION OF FIGURES 

Figure 1 illustrates basic sensing useful in practicing the invention. 

Figurela illustrates a basic two dimensional embodiment of the invention utilizing one or 
15 more retroreflective datums on an object, further including means to share function with 
normal imaging for internet teleconferencing or other activities. 

Figure 1b illustrates a 3 Dimensional embodiment using single camera stereo with 3 or 
more datums on an object or wrist of the user. 

20 

Figure 1c illustrates another version of the embodiment of fig 1a, in which two camera 
"binocular" stereo cameras are used to image an artificial target on the end of a pencil. 
Additionally illustrated is a 2 camera stereo and a line target plus natural hole feature on 
an object. 

25 

Figure 1d illustrates a control flow chart of the invention. 
Figure 1e is a flow chart of a color target processing embodiment. 
30 Figure 2 illustrates Computer aided design system (CAD) related embodiments. 
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Figure 2a Describes a illustrates a first CAD embodiment according to the invention, 
and a version for 3-D digitizing and other purposes. 

Figure 2b describes another Computer Design embodiment with tactile feedback for 
5 "whittling " and other purposes. 

Figure 3 illustrates additional embodiments working virtual objects, and additional alias 
objects according to the invention. 

10 Figure 4 illustrates a car driving game embodiment of the invention, which in addition 
illustrates the use of target-based artifacts and simplified head tracking with viewpoint 
rotation. The car dash is for example a plastic model purchased or constructed to 
simulate a real car dash, or can even be a make-believe dash (i.e. in which the dash is 
made from for example a board, and the steering wheel from a dish), and the car is 

15 simulated in its actions via computer imagery and sounds. 

Figure 5 illustrates a one or two person airplane game according to the invention, to 
further include inputs for triggering and scene change via movement sequences or 
gestures of a player. Also illustrated in fig 5c is a hand puppet game embodiment of the 
20 invention played if desired over remote means such as the Internet. 

Figure 6 illustrates other movements such as gripping or touch which can be sensed by 
the invention indicating which can be useful as input to a computer system, for the 
purpose of signaling that a certain action is occurring. 

25 

Figure 7 illustrates further detail as to the computer architecture of movement 
sequences and gestures, and their use in computer instruction via video inputs. Also 
illustrated are means to determine position and orientation parameters with minimum 
information at any point in time. 
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Figure 8 illustrates embodiments, some of which are a simulation analog of the design 
embodiments above, used for Medical or dental teaching and other applications. 

Figure 8a illustrates a targeted scalpel used by a medical student for simulated surgery, 
further including a compressible member for calculating out of sight tip locations. 

Figure 8c illustrates targeted instruments and targeted body model. 

Figure 8d illustrates a body model on a flexible support. 

Figure 8e illustrates a dentist doing real work with a targeted drill. 

Figure 8f shows how a surgeon can control the manipulation of a laparoscopic tool or a 
robot tool through the complex 3D environment of a body with the help of a targeted 
model of a body as an assembly of body parts. 

Figure 9 illustrates a means for aiding the movement of persons hands while using the 
invention in multiple degree of freedom movement. 

Figure 10 illustrates a natural manner of computer interaction for aiding the movement 
of persons hands while using the invention in multiple degree of freedom movement 
with ones arms resting on a armrest of a chair, car, or the like. 

Figure 11 illustrates coexisting optical sensors for other variable functions in addition to 
image data of scene or targets. A particular illustration of a Level vial in a camera field 
of view illustrates as well the establishment of a coordinate system reference for the 
overall 3-6 degree of freedom coordinate system of the camera(s). 
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Figure 12 illustrates a touch screen employing target inputs from fingers or other objects 
in contact or virtual contact with the screen, either of the conventional CRT variety, an 
LCD screen, or a projection screen-including aerial projection in space. Calibration or 
other functions via targets projected on the screen is also disclosed. 

5 

Figure 13 illustrates clothes design using preferred embodiments incorporating finger 
touch, laser pointing and targeted material. 

Figure 14 illustrates additional applications of alias objects such as those of figure 3, for 
10 purposes of planning visualization, building toys, and inputs in general. 

Figure 1 5 illustrates a sword play and pistol video game play of the invention using life 
size projection screens, with side mounted stereo camera and head tracking audio 
system (and/or TV camera/light source tracker). 

15 

Figure 16 illustrates an embodiment of the invention having a mouse and/or keyboard of 
the conventional variety combined with a targets of the invention on the user to give an 
enhanced capability even to a conventional word processing or spreadsheet, or other 
program. A unique portable computer for use on airplanes and elsewhere is disclosed. 

20 

Figure 17 illustrates a optically sensed keyboard embodiment of the invention, in this 
case for a piano. 

Figure 18 illustrates gesture based musical instruments such as violins and virtual 
25 object musical instruments according to the invention, having synthesized tones and, if 
desired, display sequences. 

Figure 19 illustrates a method for entering data into a CAD system used to sculpt a car 
body surface. 

30 

Figure 20 illustrates an embodiment of the invention used for patient or baby monitoring. 
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Figure 21 illustrates a simple embodiment of the invention for toddlers and preschool 
age children, which is also useful to aid learning in very young children and infants by 
relating gestures of hands and other bodily portions or objects such as rattles held by 
5 the child, to music and /or visual experiences. 

DETAILED DESCRIPTION OF THE INVENTION 

Figure 1a 

10 

Figure 1a illustrates a simple single camera based embodiment of the invention. In this 
case, a user 5, desires to point at an object 6 represented electronically on the screen 7 
and cause the pointing action to register in the software contained in computer 8 with 
respect to that object (a virtual object), in order to cause a signal to be generated to the 
15 display 7 to cause the object to activate or allow it to be moved, (e.g. with a subsequent 
finger motion or otherwise). He accomplishes this using a single TV camera 10 located 
typically on top of the screen as shown or alternatively to the side (such as 1 1) to 
determine the position of his fingertip 12 in space, and/or the pointing direction of his 
finger 13. 

20 

It has been proposed by Sigel and others to utilize the natural image of the finger for 
this purpose and certain US patents address this in the group referenced above. 
Copending applications by one of the inventors (Tim Pryor) also describe finger related 
activity. 

25 

As disclosed in said co-pending application, it is however, often desirable to use retro- 
reflective material on the finger, disclosed herein as either temporarily attached to the 
finger as in jewelry or painted on the finger using retro-reflective coating "nail polish" or 
adhered to the finger such as with adhesive tape having a retro-reflective coating. Such 
30 coatings are typically those of Scotch-lite 7615 and its equivalent that have high specific 
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reflectivity, contrasting well to their surroundings to allow easy identification. The 
brightness of the reflection allows dynamic target acquisition and tracking at lowest cost. 

The camera system employed for the purposes of low cost desirable for home use is 
5 typically that used for Internet video conferencing and the like today. These cameras 
are CCD's and more recently CMOS, cameras having low cost (25-100 dollars) yet 
relatively high pixel counts and densities. It is considered that within a few years these 
will be standard on all computers, for all-intents and purposes, "free" to the applications 
here proposed, and interfaced via "fire wire"(IEEE 1394) or USB (universal serial bus). 

10 

The use of retroreflective and /or highly distinctive targets (e.g. bright orange triangles) 
allows reliable acquisition of the target in a general scene, and does not restrict the 
device to pointing on a desktop application under controlled lighting as shown in Sigel or 
others. Active (self luminous) targets such as LEDS also allow such acquisition, but are 
15 more costly, cumbersome and obtrusive and generally less preferable. 

If we consider camera system 10 sitting on top of the screen 7 and looking at the user 
or more particularly, the user's hand, in a normal case of Internet telephony there is a 
relatively large field of view so that the user's face can also be seen. This same field of 
20 view can be used for this invention but it describes a relatively large volume. For higher 
precision, add-on lenses or zoom lenses on the camera may be used to increase the 
resolution. 

Or it is possible according to the invention to have a plurality of cameras, one used for 
25 the Internet and the other used for the input application here described. Indeed with the 
ever dropping prices, the price of the actual camera including the plastic lens on the 
CMOS chip is so low, it is possible perhaps even to have multiple cameras with fixed 
magnifications, each having a separate chip! 
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These can easily be daisy chained with either fire wire or USB such that they can either 
be selected at will electronically in fact by the different magnifications or pointing 
directions desired. 



5 Let us now return now to the question of determining location or orientation of a human 
portion such as typically a hand, or finger - in this case, a finger. In order to make this 
invention operate in the lowest possible cost it is desirable that the lighting available be 
low cost as well. Indeed if the camera units are shared with telephony using the natural 
lighting of the object, then the cost of specialized lighting required for the retro-reflectors 
10 adds cost to the system. The power for the lighting, such as LEDs can generally be 
conveyed over the USB or 1394 bus however. 

The user can also point or signal with an object such as 15 having datum 16 on it, such 
as a retroreflective dot 16 or line target 17. 

15 

It is possible to expand the sensing of 2D positions described above into 3, 4, 5 and 6 
dimensions. (x,y plus z, pitch,yaw, roll). Two sensing possibilities of the many possible, 
are described in various embodiments herein. 

20 1. The first, illustrated in fig 1a and b is to utilize a single camera, but multiple discrete 
features or other targets on the object which can provide a multidegree of freedom 
solution. In one example, the target spacing on the object is known a priori and 
entered into the computer manually or automatically from software containing data 
about the object, or can be determined through a taught determining step. 

25 

2. The second is a dual camera solution shown in fig 1c and d that does not require a 
priori knowledge of targets and in fact can find the 3D location of one target by itself, 
useful for determining finger positions for example. For 6-degree freedom of 
information, at least three point, targets are required, although line targets, and 
30 combinations of lines and points can also be used. 
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Figure 1 b illustrates a 3-D (3 Dimensional) sensing embodiment using single camera 
stereo with 3 or more datums on a sensed object, or in another example, the wrist of the 
user. 

5 As shown the user holds in his right hand 29, object 30 which has at least 3 visible 
datums 32, 33, and 34 which are viewed by TV camera 40 whose signal is processed 
by computer 41 which also controls projection display 42. TV camera 40 also views 3 
other datums 45, 46 and 47, on the wrist 48 of the users left hand, in order to determine 
its orientation or rough direction of pointing of the left hand 51, or its position relative to 
10 object 30, or any other data (e.g. relation to the screen position or other location related 
to the mounting position of the TV camera, or to the users head if viewed, or what ever. 
The position and orientation of the object and hand can be determined from the 3 point 
positions in the camera image using known photogrammetric equations (see Pinckney, 
reference USP #4,219,847 and other references in papers referenced). 

15 

Alternatively to the 3 discrete point target, a colored triangular target for example can be 
used in which the intersections of lines fitted to its sides define the target datums, as 
discussed below. 

20 It is also possible to use the camera 40 to see other things of interest as well. For the 
direction of pointing of the user at an object 55 represented on display 42 is determine 
for example datum 50 on finger 52 of users left hand 51 (whose wrist position and 
attitude can be also determined). 

25 Alternatively, the finger can be detected just from its general gray level image, and can 
be easily identified in relation to the targeted wrist location (especially if the user, as 
shown, has clenched his other fingers such that the finger 52 is the only one extended 
on that hand). 

30 The computer can process the gray level image using known techniques, for example 
blob and other algorithms packaged with the Matrox brand Genesis image processing 
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board for the PC, and determine the pointing direction of the finger using the knowledge 
of the wrist gained from the datums. This allows the left hand finger 50 to alternatively 
point at a point (or touch a point) to be determined on the object 30 held in the right 
hand as well. 

5 

Figure 1c 

Figure 1c illustrates another version of the embodiments of fig 1a and b, in which two 
camera "binocular" stereo cameras 60 and 61 processed by computer 64 are used to 

10 image artificial target (in this case a triangle, see also fig 2), 65, on the end of pencil 66, 
and optionally to improve pointing resolution, target 67 on the tip end of the pencil, 
typically a known small distance from the tip. (the user and his hand holding the pencil 
is not shown for clarity. This imaging allows one to track the pencil tip position in order 
to determine where on the paper (or TV screen, in the case of a touch screen ) the 

15 pencil is contacting, (see also fig 2, and fig 12). 

For best results it is often desirable to have independently controllable near coaxial light 
sources 62 and 63 are shown controlled by computer 64 to provide illumination of 
retroreflective targets for each camera independently. This is because at different 
20 approach angles the retroreflector reflects differently, and since the cameras are often 
angularly spaced (e.g. by non-zero angle A), they do not see a target the same. 

Numerous other camera arrangements, processing, computation, and other issues are 
discussed in general relative to accurate determination of object positions using two or 
25 more camera stereo vision systems in the S.F. El Hakim paper referenced above and 
the additional references referred to therein. 

The computer can also acquire the stereo image of the paper and the targets in its four 
corners, 71-74. Solution of the photogrammetric equation allows the position of the 
30 paper in space relative to the cameras to be determined, and thence the position of the 
pencil, and particularly its tip, to the paper, which is passed to display means 75 or 
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another computer program. Even with out the target on the end, the pointing direction 
can be determined from target 65 and knowing the length of the pencil the tip position 
calculated. 

5 A line target 76 can also be useful on the pencil, or a plurality of line targets spaced 
circumferentially, can also be of use in defining the pencil pointing direction from the 
stereo image pair. 

A working volume of the measurement system is shown in dotted lines 79 - that is the 
10 region on and above the desk top in this case where the sensor system can operate 
effectively. Typically this is more than satisfactory for the work at hand. It is noted that 
due to possible compound inclination of the cameras, and other geometric 
considerations, the effective working volume for any given accuracy or resolution 
criteria, does not necessarily have parallel sides. 

15 

It is noted that the dual ( Stereo pair )camera system of fig 1 has been extensively 
tested and can provide highly accurate position and orientation information in up to 6 
degrees of freedom. One particular version using commercial CCD Black and white 
cameras and a Matrox "Genesis" framegrabber and image processing board, and 

20 suitable stereo photogrammetry software running in an Intel Pentium 300MHZ based 
computer, has characteristics well suited to input from a large desktop CAD station for 
example. This provides 30Hz updates of all 6 axes (x y z roll pitch and yaw )data over a 
working volume of 0.5 meter x 0.5 meter in x and y (the desktop, where cameras are 
directly overhead pointing down at the desk) and 0.35 meters in z above the desk, all to 

25 an accuracy of 0.1 mm or better, when used with clearly visible round retroreflective 
(scotchlite 7615 based) datums approx. 5-1 5mm in diameter on an object for example. 
This is accurate enough for precision tasks such as designing objects in 3D cad 
systems, a major goal of the invention. 
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The cameras in this example are mounted overhead. If mounted to the side or front, or 
at an angle such as 45 degrees to the desktop, the z axis becomes the direction 
outward from the cameras. 

5 Figure 1c additionally illustrates 2 camera stereo arrangement, used in this case to 
determine the position and orientation of an object having a line target, and a datum on 
a portion of the user. Here, camera s 60 and 61 are positioned to view a retro-reflective 
line target 80 in this case running part of the length of a toy sword blade 81 . The line 
target in this case is made as part of the plastic sword, and is formed of molded in 
10 corner cube reflectors similar to those in a tail light reflector on a car. It may also made 
to be one unique color relative to the rest of the sword, and the combination of the two 
gives an unmistakable indication. 

There are typically no other bright lines in any typical image when viewed 
15 retroreflectively. This also illustrates how target shape (i.e. a line) can be used to 

discriminate against unwanted other glints and reflections which might comprise a few 
bright pixels worth in the image. It is noted that a line type of target can be cylindrical in 
shape if wrapped around a cylindrical object, which can be viewed then from multiple 
angles. 

20 

Matching of the two camera images and solution of the photogrammetric equations 
gives the line target pointing direction. If an additional point is used, such as 82 the full 6 
degree of freedom solution of the sword is available. Also shown here is yet another 
point, 83, which serves two purposes, in that it allows an improved photogrammetric 
25 solution, and it serves as a redundant target in case 82 cant be seen, due to 
obscuration, obliteration, or what have you. 

This data is calculated in computer 64, and used to modify a display on screen 75as 
desired, and further described in figure 15. 

30 
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In one embodiment a matrox genesis frame processor card on an IBM 300mhz PC was 
used to read both cameras, and process the information at the camera frame rate of 
30HZ. Such line targets are very useful on sleeves of clothing, seams of gloves for 
pointing, rims of hats, and other decorative and practical purposes for example for 
5 example outlining the edges of objects or portions thereof, such as holes and openings. 

Typically the cameras 60 and 61 have magnifications and fields of view which are 
equal, and overlap in the volume of measurement desired. The axes of the cameras can 
be parallel, but for operation at ranges of a few meters or less, are often inclined at an 

10 acute angle A with respect to each other, so as to increase the overlap of their field of 
view- particularly if larger baseline distances d are used for increased accuracy (albeit 
with less z range capability.). For example for a cad drawing application, A can be 30- 
45 degrees, with a base line of 0.5 to 1 meter. Where as for a video game such as 
figure 5, where z range could be 5 meters or more, the angle A and the base line would 

1 5 be less, to allow a larger range of action. 

Data base 

The datums on an object can be known a priori relative to other points on the object, 
20 and to other datums, by selling or other wise providing the object designed with such 
knowledge to a user and including with it a CD ROM disc or other computer interfacable 
storage medium having this data. Alternatively, the user or someone, can teach the 
computer system this information. This is particularly useful when the datums are 
applied by the user on arbitrary objects. 

25 

Figure 1d 

Illustrated here are steps used in the invention relating to detection of a single point to 
make a command, in this case, the position (or change of position, i.e. movement) of a 
30 finger tip in figure 12 having retroreflective target attached 1202 detected by stereo pair 
of TV cameras 1210, using detection algorithm which in its simplest case is based on 
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thresholding the image to see only the bright target indication from the finger (and 
optionally, any object associated therewith such as a screen to be touched for 
example). 

5 If this is insufficient to unambiguously defined the datum on the finger, added algorithms 
may be employed which are themselves known in the art (many of which are commonly 
packaged with image analysis frame grabber boards such as the matrox genesis. The 
processes can include, for example: 

10 a brightness detection step relative to surroundings, or to immediate surroundings ( 
contrast); 

a shape detection step, in which a search for a shape is made, such as a circle, ring, 
triangle, etc.; 

a color detection step, where a search for a specific color is made; 
15 a movement step, wherein only target candidates which have moved from a location 
in a previous TV image are viewed. 

Each step, may process only those passing the previous step, or each may be 
performed independently, and the results compared later. The orders of these steps can 
20 be changed but each adds to further identify the valid indication of the finger target. 

Next the position of the targeted finger is determined by comparing the difference in 
location of the finger target in the two camera images of the stereo pair. There is no 
matching problem in this case, as a single target is used, which appears as only one 
25 found point in each image. 

After the Image of finger (or other tool) tip is found, its location is computed relative to 
the screen or paper, and this data is inputted to the computer controlling the display to 
modify same, for example the position of a drawing line, an icon, or to determine a 
30 vector of movement on the screen. 
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Motion detection. 



The computer 8 can be used to analyze incoming TV image based signals and 
determine which points are moving in the image This is helpful to eliminate background 
5 data which is stationary, since often times only moving items such as a hand or object 
are of interest. In addition, the direction of movement is in many cases the answer 
desired or even the fact that a movement occurred at all. 

A simple way to determine this is to subtract an image of retroreflective targets of high 
10 contrast from a first image- and just determine which parts are different- essentially 
representing movement of the points. Small changes in lighting or other effects are not 
registered. There are clearly more sophisticated algorithms as well. 

Motion pre processing is useful when target contrast is not very high, as it allows one to 
15 get rid of extraneous regions and concentrate all target identification and measurement 
processing on the real target items. 

Such processing is also useful when two camera stereo is used, as only moving points 
are considered in image matching- a problem when there are lots of points in the field. 

20 

Can it be assumed that the object is moving? The answer is yes if it's a game or many 
other activities. However there may be a speed of movement of issue. Probably frame 
to frame is the criteria, in a game, namely 30 Hz for a typical camera. However, in some 
cases movement might be defined as something much slower- e.g. 3 Hz. for a CAD 
25 system input using deliberate motion of a designer. 

Once the moving datum is identified, then the range can be determined and if the object 
is then tracked even if not moving from that point onward, the range measurement gives 
a good way to lock onto the object using more than just 2 dimensions. 

30 
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One might actually use an artificial movement of the target if one doesn't naturally exist. 
This could be done by causing it to vibrate If a one or more LEDs is used as a target, 
they can be made to blink, which also shows up in an image subtraction (image with led 
on, vs. image with led off). The same is true of a target which changed color, showing 
5 up in subtraction of color images. 

Image subtraction or other computer processing operations can also be useful in 
another sense. One can also subtract background, energizing the retroreflective 
illumination light with no retroreflective targets present, and then with them. One idea is 
10 simply to take a picture of a room or other work space, and then bring in the targeted 
object. That would seem pretty simple to subtract or whatever. And the net result is that 
any bright features in the space which are not of concern, such as bright door knobs, 
glasses, etc are eliminated from consideration. 

15 This can also be done with colored targets, doing a color based image subtract- 
especially useful when one knows the desired colors a priori (as one would, or could, 
via a teach mode). 

A flow chart is shown in figure 1 d illustrating the steps as follows: 

20 

A. Acquire images of stereo pair; 

B. Optionally preprocess images to determine if motion is present. If so, pass to 
next step otherwise do not or do anyway (as desired); 

C. Threshold images; 

25 D. If light insufficient, change light or other light gathering parameter such as 
integration time; 

E. Identify target(S); 

F. If not identifiable, add other processing steps such as a screen for target color, 
shape, or size; 

30 G. Determine centroid or other characteristic of target point (in this case a retro dot 
on finger); 
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H. Perform auxiliary matching step if required; 

I. Compare location in stereo pair to determine range z and x y location of target(s) 
J. Auxiliary step of determining location of targets on screen if screen position not 

known to computer program. Determine via targets on screen housing or 
5 projected on to screen for example; 

K. Determine location of target relative to screen; 
L Determine point in display program indicated; 
M. Modify display and program as desired. 



10 The simple version of the invention here disclosed answers several problems 
experienced in previous attempts to implement such inputs to computers. 

1. Computationally intensive; 

2. Latency (frequency response, time to get position or orientation answer); 
15 3. Noise (unreliability caused by ambient electronic, processing, or other 

conditions); 

4. Lighting (unreliability caused by ambient illumination, processing, or other 
conditions); 

5. Initialization; 

20 6. Background problems, where the situation background cannot be staged, as in a 
cad system input on a desk. 

It particularly achieves this simply and at low cost because of the function of the 
retroreflector targets used, which help answer all 6 needs above. When combined with 
25 color and/or shape detection, the system can be highly reliable fast and low cost. In 
some more controlled cases, having slower movements and more uniform backgrounds 
for example, retro material is not needed. 



Figure 1e 



30 
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The following is a multi-degree of freedom image processing description of a triangular 
shaped color target (disclosed itself in several embodiments of the invention herein) 
which can be found optically using one or more cameras to obtain the 3 dimensional 
location and orientation of the target using a computer based method described below. 
5 .. It uses color processing to advantage, as well as a large number of pixels for highest 
resolution, and is best for targets that are defined by a large number of pixels in the 
image plane, typically because the target is large, or the cameras are close to the 
target, or the camera field is composed of a very large number of pixels. 

10 The method is simple but unique in that it can be applied 1) in a variety of degrees to 
increase the accuracy (albeit at the expense of speed), 2) with 1 or more cameras ( 
more cameras increase accuracy), 3) it can utilize the combination of the targets colors 
and triangles, (1 or more) to identify the tool or object. It utilizes the edges of the 
triangles to obtain accurate subpixel accuracy. A triangle edge can even have a gentle 

15 curve and the method will still function well. Other geometric shapes can also be 
processed similarly in some cases. 

The method is based on accurately finding the 3 vertices (F0,G0,F1 ,G1 ,F2,G2) of each 
triangle in the camera field by accurately defining the edges and then computing the 
20 intersection of these edge curves. This is generally more accurate, than finding 3 or 4 
points from spot centroids. However, the choice of which to use, often comes down to 
which is more pleasing to the consumer, or more rugged and reliable in use. 

The preferred implementation uses 1 or more color cameras to capture a target 
25 composed of a brightly colored right triangle on a rectangle of different brightly colored 
background material. The background color and the triangle color must be two colors 
that are easily distinguished from the rest of the image. For purposes of exposition we 
will describe the background color as a bright orange and the triangle as aqua. 

30 By using the differences between the background color and the triangle color, the 

vertices of the triangle can be found very accurately. If there are more than one triangle 
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on a target, a weighted average of location and orientation information can be used to 
increase accuracy. 

The method starts searching for a pixel with the color of the background or of the 
5 triangle beginning with the pixel location of the center of the triangle from the last frame. 
Once a pixel with the triangle "aqua" color is found, the program marches in four 
opposite directions until each march detects a color change indicative of an edge 
dividing the triangle and the "orange" background. Next, the method extends the edges 
to define three edge lines of the triangle with a least squares method. The intersection 
10 points of the resulting three lines are found, and serve as rough estimates of the triangle 
vertices. These can serve as input for applications that don't require high accuracy. 

If better accuracy is desired, these provisional lines are then used as a starting point for 
the subpixel refinement process. Each of these 3 lines is checked to see if it is mainly 
15 horizontal. If a line is mainly horizontal, then a new line will be determined by fitting a 
best fit of a curve through the pixel in each column that straddles the provisional line. If 
a line is mainly vertical, then the same process proceeds on rows of pixels. 

The color of each pixel crossed by a line is translated into a corresponding numeric 
20 value. A completely aqua pixel is would receive the value 0, while a completely orange 
pixel would receive the value 1 . All other colors produce a number between 0 and 1 , 
based on their relative amounts of aqua and orange. This numeric value, V, assigned to 
a pixel is a weighted average of the color components (such as the R, G, B values) of 
the pixel. If the components of the calibrated aqua are AR, AG, AB and those of orange 
25 are OR, OG, OB, and the pixel components are PR, PG, PB, then the numeric value V 
is : 

V = WR * CR + WG * CG + WB * CB 
With WR, WG, WB being weighting constants between 0 and 1 and CR is defined as: 

30 

A flow chart is shown in fig 2a. 
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The same process can be used to define CG and CB. 

This value V is compared with the ideal value U which is equal to the percentage of 
5 orangeness calculated assuming the angle of the provisional line is the same as that of 
the ideal line. For example, a pixel which is crossed by the line in the exact middle 
would have a U of 0.5, since it is 50% aqua and, 50% orange. A fit of U-V in the column 
(or row) in the vicinity of the crossing of the provisional line gives a new estimate of the 
location of the true edge crossing. Finally, the set of these crossing points can be fit with 
10 a line or gentle curve for each of the three edges and the 3 vertices can be computed 
from the intersections of these lines or curves. 

We can now use these three accurate vertices in the camera plane 
(F0,G0,F1,G1,F2,G2) together with lens formula (here we will use the simple lens 

1 5 formula for brevity) to relate the x and y of the target to F and G 

F = ?,X/Z; G^XY/Z 
X is the focal length and z is the perpendicular distance from the lens to a location on 
the target. A triangle on the target is initially defined as lying in a plane parallel to the 
lens plane. The preferred configuration has one right triangle whose right angle is 

20 defined at x0, yO, zO with one edge (of length A) extending along the direction of the F 
axis of the camera and with the other edge (of length B) extending along the direction of 
the G axis of the camera. The actual target orientation is related to this orientation with 
the use of Euler Angles <p, 0, \y. Together with the lens equations and the Euler 
equations, the 6 derived data values of the 3 vertices (F0, GO, F1, G1, F2, G2) can be 

25 used to define 6 values of location and orientation of the target. The location and 

orientation of a point of interest on any tool or object rigidly attached to this target can 
be easily computed from calibration data and ordinary translation and rotation 
transformations. Refinements to handle lens distortions can be handled by forming a 
correction function with calibration data that modifies the locations of the F and G data. 

30 
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The Euler formulation is nonlinear. We linearize the equations by assuming initially that 
the angles have not changed much since the last video frame. Thus we replace cp with <p 
(old) + U1., e with 9(old) +U2, y with v|/(old) + U3, and zO with zO(old) + U4 or: 

5 <p = 9 + U1 

0 = 9 + U2 
- \\> + U3 
zO = zO + U4 



10 Substituting these into the Euler equations and applying the lens formulas leads to a 
matrix equation 

SU = R 

15 that can be solved for the U values with a standard methods such as Gauss Jordan 
routine. The angles and zO can be updated iteratively until convergence is achieved. 
The coefficients of the matrix are defined as: 

s1 1 = -A (cos(<p) (F1 / X cos(y) + sin(\(/) ) - sin((p) cos(0) (F1 / A, sin(\|/) - cos(v|/) ) ) 
20 s1 2 = A sin(G) cos(q>) (F1 / X sin(\j/) - cos(v|/) 

s1 3 = A (sin(<p) (F1 / X sin(iy) - cos(i|/) ) - cos(<p) cos(0) (F1 / X cos(i|/) - sin(v|/) ) ) 
s14 = (F0-F1)/X 

s21 = A (G1 / X (-cos((p)*cos(vj/) + sin((p) sin(\y) cos(0) ) + sin(0) sin((p) ) 
s22 = A cos((p) (G1 / X sin(0) s\n(y) - cos(0) ) 
25 s23 = G1 / X A (sin(vj/) sin(cp) - cos(\|/) cos(0) cos((p) ) 
s24 = (G0-G1)/\ 
s31 =0 

s32 = - B cos(0) (F2 / X sin(\|/) - cos(\)/) ) 
s33 = -B sin(9) (F2 / X cos(y) + sinfy) ) 
30 s34 = (F0-F2)/A. 
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s41 



0 



s44 



s42 



s43 



- B ( G2 / X sin(i|/) cos(9) + sin(9) ) 

-BG2/Xsin(9)cos(i|/) 

(G0-G2)/X 
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and the right hand side vector is defined as: 

r1 = (F1-F0) zO/X + A (F1/X (cos(iy) sin(cp) + cos(G) cos(<p) sin(y)) + sin(v(/) sin(<p) - 
cos(0) cos((p) cos(\(/) ) 

10 r2 = (G1-G0) zO / X + A ( G1 / X (cos(y) sin(cp) + cos(8) cos(cp) sin(vj/)) + sin(9) cos(cp) 
) 

r3 = (F2-F0) zO / X + B sin(9) (F2 / X sin(y) - cos(y) ) 
r4 = (G2- GO) zO / X + B (G2 AX sin(9) sinty) - cos(9) ) 

15 After convergence the remaining parameters xO and yO are defined from the equations: 



20 The transition of pronounced colors can yield considerably more information than a 
black white transition, and is useful for the purpose of accurately calculating position 
and orientation of an object. As color cameras and high capacity processors become 
inexpensive, the added information provided can be accessed at virtually no added 
cost. And very importantly, in many cases color transitions are more pleasing to look at 

25 for the user than stark black and white. In addition the color can be varied within the 
target to create additional opportunities for statistically enhancing the resolution with 
which the target can be found. 

Problems in 3Dimensional input to computers 

30 



xO = F0 zO / X 



i. 



Y0 = GO zO / X 
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Today, input to a computer for Three Dimensional (3D) information is often 
painstakingly done with a 2 Dimensional device such as a mouse or similar device. This 
artifice, both for the human, and for the program and its interaction with the human is 
un-natural, and CAD designers working with 3D design systems require many years of 
5 experience to master the skills needed for efficient design using same. 

A similar situation exists with the very popular computer video games, which are 
becoming ever more 3 Dimensional in content and graphic imagery, but with similar 
limitations. These games too heretofore have not been natural for the player(s). 

10 

"Virtual reality" too requires 3D inputs for head tracking, movement of body parts and 
the like. This has lead to the development of a further area of sensor capability which 
has resulted in some solutions which are either cumbersome for the user, expensive, or 
both. 

15 

The limits of computer input in 3D have also restricted the use of natural type situations 
for teaching, simulation in medicine, and the like. It further limits young children, older 
citizens, and disabled persons from benefiting from computer aided living and work. 

20 Another aspect is digitization of object shapes. There are times that one would like to 
take a plastic model or a real world part as a starting point for a 3D design. Prior art 
devices that capture 3D shapes are however, expensive and cumbersome and cannot, 
like the invention, share their function for replacement of the mouse or 2D graphic 
tablet. 

25 

We propose one single inexpensive device that can give all of this control and also act 
as a drawing pad, or input a 3D sculptured forms or even allow the user to use real clay 
that as she sculptures it the computer records the new shape. 

30 The invention as here disclosed relates physical activities and physical objects directly 
to computer instructions. A novice user can design a house with a collection of targeted 
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model or "toy" doors, windows, walls etc. By touching the appropriate toy component 
and then moving and rotating the user's hand she can place the component at the 
appropriate position. The user can either get his or her visual cue by looking at the 
position of the toy on the desk or by watching the corresponding scaled view on the 
5 computer display. Many other embodiments are also possible. 



This figure illustrates an embodiment wherein the invention is used to "work" 
on an object, as opposed to pointing or otherwise indicating commands or 
10 actions. It is a computer aided design system (CAD) embodiment according to the 
invention which illustrates several basic principles of optically aided computer inputs 
using single or dual/multi-camera (Stereo) photogrammetry. Illustrated are new forms of 
inputs to effect both the design and simulated assembly of objects. 



15 3D Computer Aided Design (CAD) was one of the first areas to bump up against the 
need for new 3D input and control capability. A mouse or in the alternative, as 2D 
graphic tablet, together with software that displays several different views of the design 
are the current standard method. The drawback is that you are forced to move along 2D 
planes defined by display views or what are known as construction views of the design 

20 object. 

This situation is especially frustrating when you start creating a design from scratch. 
The more sculptured the design, the more difficult this becomes. The current CAD 
experience feels more like an astronaut in a space suit with bulky fingertips and limited 
25 visibility trying to do delicate surgery. 

A large number of specialized input devices have been designed to handle some of 
these problems but have had limited success. Just remember your own frustrations with 
the standard mouse; Imagine attempting to precisely and rapidly define and control 
30 complex 3D shapes all day, every day. This limits the usefulness of such design tools to 
only a relatively rare group, and not the population as a whole. 
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Ideally we want to return to the world we experience everyday where we simply reach 
our hand to select what we want to work with, turn it to examine it more closely, move 
and rotate it to a proper position to attach it to another object, find the right location and 
5 orientation to.apply a bend of the proper amount and orientation to allow it to fit around 
another design object, capture 3D real work models, or stretch and sculpture designs. 

One of the most wonderful properties of this invention is that it gives the user the ability 
to control not only 3D location with the motion of his hand but he also has 4 other pieces 
10 of data (3 orientation angles and time) that can be applied to control parameters. For 
example if we wanted to blend 2 designs (say a Ferrari and a Corvette) to create a new 
design, this process could be controlled simply by 

1) moving the users hand from left to right to define the location of the cross section to 
15 be blended, 

2) tilt the hand forward to defined the percentage "P" used to blend the 2 cross 
sections, and 

3) hit the letter R on the keyboard to record items 1 and 2. From the each of the 2 cross 
sectional curves define a set of (x, y) coordinates and create a blended cross 

20 sectional coordinate set as follows: 

X (blend) = P * X (Ferrari) + (1-P) * X (Corvette) 
Y (blend) + P * Y (Ferrari) + (1-P) * Y (Corvette) 

25 Note here and elsewhere, keystrokes can be replace if desired by voice commands, 
assuming suitable voice recognition capability in the computer. 

In the apparatus of fig 1 , we desire to use a touching and indicating device 216 with 
action tip 217 and multidegree of freedom enabling target 215 that the user holds in her 
30 hand. Single targets, or multiple targets can be used with a camera system such as 206 
so as to provide up to 6 axis information of pointing device position and orientation vis a 
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vis the camera reference frame, and by matrix transform, to any other coordinate 
system such as that of a TV display, 220. 

In using the invention in the form, a user can send an interrupt signal from an "interrupt 
5 member" (such as pressing a keyboard key) to capture a single target location and 
orientation or a stream of target locations (ended with another interrupt). A computer 
program in computer determines the location and orientation of the target. The location 
and orientation of the "action tip": 217 of the pointing device can be computed with 
simple offset calculations from the location and orientation of the target or target set. 

10 

The set of tip 217 locations defines the 3D shape of the real world object 205. Different 
targeted tools with long or curved extensions to their action tips can be used to reach 
around the real world object while maintaining an attached target in the target volume 
so the cameras can record its location/orientation. 

15 

By lifting the tip of the pointing device off the surface of the object, the user can send 
location and orientation information to operate a computer program that will deform or 
modify the shape of the computer model displayed. Note that the user can deform a 
computer model even if there is no real world object under the tip. The tip location and 
20 orientation can always be passed to the computer program that is deforming the 
computer model. 

The same device can be used to replace graphic tablets, mice, or white boards, or to be 
used in conjunction with a display screen, turning into a form of touch screen (as 

25 previously, and further discussed herein). In one mode Interrupt members can be 

activated (i.e. a button or keyboard key etc. can be pressed) like mouse buttons. These 
together with the target ID can initiate a computer program to act like a pen or an eraser 
or a specific paintbrush or spray can with width or other properties. The other target 
properties (z, or orientation angles) can be assigned to the computer program's pen, 

30 brush or eraser letting the user dynamically change these properties. 
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Target(s) can be attached to a users hand or painted on her nails using retroreflective 
nail polish paint for example allowing the user to quickly move their hand from the 
keyboard to allow camera or cameras and computer like that of fig 1 to determine the 
position and orientation in 2D or 3D of a computer generated object on the display, and 
5 to set the view-direction or zoom, or input a set of computer parameters or computer 
instructions. This can all be done with the same device that we described in the above 
figures. 

A major advantage is that this is done without having to grab a mouse or other device. 
10 Finger tips can be tracked in order to determine a relative movement such as a grasping 
motion of the fingers, further described in fig 6. Similarly the relation of say one finger, to 
the nail of the other hand can be seen. 

Suitable indication can be the nail or natural image of the finger itself if suitable 
15 processing time and data processing power is available. However, as pointed our 

above, results today are expeditiously and economically best achieved by using easily 
identified, and preferably bright indicia such as retroreflective items, brightly colored or 
patterned items, unusually shaped items or a combination thereof. 

20 One can also modify or virtually modify the thing digitized with the tools disclosed. The 
computer can both process the optical input and run the computer application software 
or a group of computers can process the optical data to obtain the location and 
orientation of the targets over time and pass that information to the application software 
in a separate computer. 

25 

The object 205 is shown being digitized with the simple pointer 216, though it could be 
different tools that could be used. For example, additional tools which could be used to 
identify the location and orientation of a 3D object are : a long stemmed pointer to work 
behind an object, pointers designed to reach into tight spaces, or around features, 
30 pointers to naturally slide over round surfaces, or planar corners.. Each time the 
"activation member" is triggered, the camera system can capture the location and 
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orientation of the target as well as its ID (alternatively one could enter the ID 
conventionally via a keyboard, voice or whatever. The ID is used to lookup in the 
associated database the location of the "work tip". The 3D coordinates can then be 
passed to the application software to later build the 3D data necessary to create a 
5 .computer model of the object. When working on the back of the object furthest from the 
cameras, the object may obscure the camera view of the target on the simple tool. Thus 
the user may switch to the long stem tool or the curved stem tool that are used to get 
around the blocking geometry of the object. Other pointers can be used to reach into 
long crevices. 

10 

Let's examine the term "activation member". This can be any signal to the computer 
system that it should initiate a new operation such as collect one or more data points, or 
store the information, or lookup information in the associated databases, etc. Examples 
of the activation member are a button or foot pedal electronically linked to the computer, 
15 a computer keyboard whose key is depressed, or a trigger turning on a light or set of 
lights on a target, or a sound or voice activation. 

Another method of acquiring a 3D shape is to slide a targeted tool over the object 
acquiring a continuous stream of 3D coordinates that can be treated as a 3D curve. 

20 These curves can later be processed to define the best 3D model to fit these curves. 
Each curve can be identified as either being an edge curve or a curve on the general 
body surface by hitting the previously defined keyboard key or other activation member. 
This method is extremely powerful for capturing clay modeling as the artist is performing 
his art. In other words, each sweep of his fingers can be followed by recording the path 

25 of a target attached to his fingers. The target ID is used to lookup in the associated 
database the artists finger width and the typical deformation that his fingers experience 
on a sweep. He can change targets as the artwork nears completion to compensate for 
a lighter touch with less deformation. 

30 Figure 2b 



34 



Figure 2b illustrates how targeted tools can be used in a CAD system or other computer 
program. A targeted work tool can be a toy model of the real world tool 280 (a toy drill 
for example) or the tool itself 281 (a small paint brush) helping the user immediately 
visualize the properties of the tool in the computer program. Note that any targeted tool 
5 can be "aliased" by another tool. For instance, the tip of the brush could be redefined 
inside the computer program to act like the tip of a drill. The location and orientation of 
the drill tip as well as the drill parameters such as its width can be derived from the 
target and together with its path and interrupt member information. The user can 
operate his CAD system as though he were operating a set of workshop or artist tools 
10 rather than traversing a set of menus. 

The work tool and an object to be worked on can be targeted, and sensed either 
simultaneously or one after the other. Their relative locations and orientations can be 
derived allowing the user, for example, to "whittle" her computer model of the object 285 
15 that she has in one hand with the tool 286 that is in the other hand. 

Also a set of objects that are part of a house design process such as a door, a window, 
a bolt or a hinge could be defined quickly without having the user traverse a set of 
menus. 

20 

This device can perform an extremely broad range of input tasks for manipulation of 2D 
or 3D applications. 

The devices that are used today for such activity are typically a mouse or a graphic 
25 tablet. Both of these devices really tend to work only in two dimensions. Everyone has 
had the experience with the mouse where it slips or skips over the mouse pad making it 
difficult to accurately position the cursor. The graphic tablet is somewhat easier to 
manipulate but it is bulky, covering up the desktop surface. 

30 The disclosed invention can replace either of these devices. It never gets stuck since it 
moves in air. We can attach a target to the top of one of our hands or paint our 
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fingernails and have them act as a target. Alternatively, for example we can pickup a 
pointing device such as a pencil with a target attached to the top of it. By merely moving 
our hand from side to side in front of the camera system we can emulate a mouse. As 
we move our hand forward and backward a software driver in our invention would 
5 >, emulate a mousa moving forward or backward, making input using known interface 
protocol straightforward. As we move our hand up and down off the table (something 
that neither the graphic tablet nor the mouse can do) our software driver can recognize 
a fully three-dimensional movement. 

10 Much of the difficulty with computer-aided design software comes from ones inability 
heretofore to move naturally around our computer object. We see a three-dimensional 
design projected onto the two-dimensional computer display and we attempt to move 
around our three-dimensional design using two-dimensional input devices such as a 
mouse or computer graphic tablet. Design would be so much easier if we could simply 

15 move our hand in a three-dimensional region to both rotate and locate design 
information. 

One example of a design session using this Invention 

20 To more concretely describe this invention we will discuss one of many possible 
implementations: 

- painted fingernails on ones hand in that will act as the targets; 

- the computer keyboard will indicated which commands I am performing. 

25 

Targets can also be attached to objects, tools, and hands. Commands can be entered 
by voice, buttons, other member manipulations/or even by the path of a target itself. 

An example of a sequence of actions is now described/ The specific keys picked for this 
30 example are not a restriction of this invention. In a further embodiment other means of 
triggering events are disclosed than key board strokes. 
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An example of a sequence of actions is now described. The specific keys picked for this 
example are not a restriction of this invention. In a further embodiment other means of 
triggering events are disclosed than keyboard strokes. 

5 

Example of CAD usage with targeted tools and objects together with voice recognition 
activated member 

Say "start" to begin using the invention. 

10 

Say "rotate View" and rotate the targeted hand inside the target volume until the view on 
the computer display is in the direction that you choose. In the same sense that a small 
motion of the mouse is scaled up or down to the useful motion in the design software, a 
small motion or rotation of the targeted hand can be scaled. Consider the target to be 

15 composed of three separate retroreflective fingernail targets. By rotating the plane 
formed by the three fingernails five degrees to the left we could make the display view 
on the screen rotate by say 45 degrees. We could also use the distance between ones 
fingers to increase or decrease the sensitivity to the hand rotation. This, if ones three 
fingers were close together a 5-degree turn of ones hand might correspond to a 5- 

20 degree turn on the screen, while if ones fingers were widely spread apart a 5-degree 
turn might correspond to 90-degree turn on the screen. Say "freeze view" to fix the new 
view. 

Move the hand inside the target volume until a 3D cursor falls on top of at the display of 
25 a computer model and then say "select model". 

Say "rotate model" and a rotation of the user's hand will cause the selected computer 
model to be rotated. Say "freeze model" to fix the rotation. 

30 Say "Select grab point" to select a location to move the selected model by. 
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Say "move model" to move the selected model to a new location. Now the user can 
move this model in his design merely by moving his hand. When the proper location and 
orientation are achieved say "freeze model" to fix the object's position. This makes CAD 
assembly easy. 

5 

Say "start curve" and move the targeted hand through target volume in order to define a 
curve that can be used either as a design edge or as a path for the objects to follow. By 
moving the fingers apart in the user can control various curve parameters. Say "end 
curve" to complete the curve definition. 

10 

Pick up a model door that is part of a set of design objects each of which has its own 
unique target and target ID. Move the targeted object in the target volume until the 
corresponding design object in the software system is oriented and located properly in 
the design. Then say "add object". The location and orientation of the model door 
15 together with the spoken instruction will instruct the CAD program to create a door in the 
computer model. Moving the targeted fingers of apart can vary parameters that define 
the door such as height or width). 

Pick up a targeted model window and say "add Object". The location and orientation of 
20 the model window together with the key hit will instruct the CAD program to create a 
window in the computer model. 

Say "define Parameters" to define the type of window and window properties. The 3 
location parameters, 3 orientation parameters, and the path motion, can be assigned by 
25 the database associated with the object to control and vary parameters that define the 
window in the computer software. Say "freeze parameters" to fix the definition. 

Example: Designing a car with targeted tools and objects, together with the keyboard as 
the member giving commands 

30 

Now we apply this to the design of an automobile. The steps are as follows: 
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1. Pick up a model of a Corvette with a target attached to it and place it in the target 
volume. 

5 2. Hit the A key (or provide another suitable signal to the computer, keys being 
representative of one type prevalent today) to the target parameters to define the 
object's parameters of interest such as model, year, and make. 

3. Pick up a targeted pointer associated with the CAD commands to locating a car part 
10 to work on. The use of this specialized pointer target ID together with hitting the L key to 
define a view of the car where the orientation of the target defines the view orientation 
and the location of the camera. If the target defines a camera position inside the car the 
design information behind the camera will not be displayed. The motion of the special 
printer after the hit could indicate other commands without the use of a keyboard hit. For 
15 instance, a forward or backward tilt could increase or decrease the zoom magnification 
of the display. A large tilt to the left could select the object under the cursor and a large 
tilt to the right could deselect the object under the cursor. In a CAD system this selection 
could mean display that part for examination while in an inventory system it could mean 
display that part for examination while in an inventory system it could mean deliver this 
20 part. 

Consider that part was hood selected for redesign in a CAD system. The user pick ups 
a targeted curvy wire. The invention will* recognize the target ID as that of a curve line 
cross section command and when the user hits any key (or gives a voice command or 
25 other suitable signal) the location and orientation of the target is determined and the 
computer program will cause a cross section curve of the hood to be acquired at the 
corresponding location and orientation. The CAD system will then expect a series of 
keystrokes and target paths to define a new cross section leading to a modified hood 
design. 

30 

Hit the M key and draw a small curve segment to modify the previously drawn curve. 
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Hit the M key again to fix the modification. 

Hit the F key to file down the hood where it seems to be too high. This is accomplished 
5 by moving the targeted fingers back and forth below some specified height above a 
surface (for example one-inch height above the desktop). The lower the fingers and 
move the target or targeted hand forward or backward. This can be linked to the surface 
definition in the CAD system causing the surface to be reduced as though a file or 
sander were being used. The lower the fingers the more material is removed on each 
10 pass. Likewise moving the fingers above one inch can be used to add material to the 
hood. Spreading the targeted fingers can increase the width of the sanding process. 

A user can acquire 3D model (plastic, clay, etc.) by hitting the C key and either rub 
targeted fingers or a hand-held targeted sculpture tool over the model. From the path of 

1 5 the targeted fingers or tool we can compute the surface by applying the offset 

characteristics of the targeted too. If the 3D object is made of a deformable material 
such as clay, the CAD system can reflect the effect of the fingers or tool passing over 
the model on each passes. If we want we can add some clay on top of the model to 
build up material where we need it. Thus we can tie art forms such as clay modeling 

20 directly into CAD or other computer systems. 

We can use targeted tools such as drills, knives, trowels, and scalpels to modify the clay 
model and its thus associated CAD model. The target ID will allow the computer to 
check the associated database to determine where the tip is relative to the target and 
25 define how the path of the target would result in the tool affecting the CAD model. 

Notice that we can use these tools in the same manner even if there's no clay model or 
other real world model to work on. Also notice that these tools could be simple targeted 
sticks but the CAD model would still be affected in the same way. 

30 Figure 3 
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Figure 3 illustrates additional embodiments working virtual objects, and additional alias 
objects according to the invention. For example a first object can be a pencil, with the 
Second object a piece of paper. It also illustrates how we can use of computer image 
determined tool position and orientation(targeted or otherwise) to give the user tactile 
5 , and visual feedback as to how the motion, location, and orientation of the tool will affect 
the application computer program. 

The user of the computer application program may have several tools that she feels 
comfortable with on her desk. An artist for instance might have a small paintbrush, a 

10 large paintbrush, a pen, an eraser, and a pencil. Each of these would have a unique 
target attached to it. The artist would then pick up the tool that she would normally use 
and draw over the surface of a sheet of paper or over the surface of display screen or 
projection of computer display. The application software would not only trace the path of 
the tip of the targeted work tool, but also treat the tool as though it were a pen or 

15 paintbrush etc. The exact characteristics of the pen would be found in the associated 
database using the target ID has a lookup key. Extra parameters such as the width of 
the line, its color, or whether it's a dashed line could be determined by keyboard input or 
by applying the height, or target orientation parameters. 

20 If the artist did not own a tool that he needed he could "alias" this tool as follows.. 

Suppose that the artist is missing a small paintbrush. He can pick up a pen move it into 
the target volume and signal the target acquisition software such as typing on the 
computer's keyboard the letter Q followed by the ID number of the small paintbrush. 
From this point on the computer will use the database us initiated with the small 

25 paintbrush instead of that of the pen. 

Specifically we are illustrating several concepts: 

This invention gives the user the natural tactile and visual feedback that she is used to 
30 and her art. Thus an artist would use targeted versions of the very tools such as pens 
306, paintbrushes 305, and erasers 310 that she uses without a computer. 
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By drawing with a targeted tool (e.g. 336, having target 337) on a paper pad (e.g.. 350 
shown in fig 3b, with target 342) or canvas, the user again continues to experience the 
traditional non-computer art form as a computer interface, (targets in multiple corners of 
5 the paper can also be used for added resolution of paper location with respect to the 
tool) The user would see her art drawn on the paper while creating a computer version 
with all of the editing and reproduction capabilities implied by computers. The targeted 
tool's motion relative to the targeted paper is what determines the line in the graphics 
system. Thus the user could even put the pad in her lap and change her position in a 
10 chair and properly input the graphic information as she draws on the paper as long as 
the targets continue to be in the view of the camera system. 

By drawing directly on a computer display, such as shown in figure 12, or transparent 
cover over a computer display, the user can make the targeted manipulate the 
15 computer display and immediately get feedback on how the graphics are effected. 
Again the art form will seem to match the traditional non-computer experience. 

Parameters such as line width, or line type, etc. can be controlled by the target 
parameters that are not used to determine the path of the line (usually this would be the 
20 target height and orientation). 

This invention allows the user to "alias" any object with any other object. 

This invention allows users to control computer programs by moving targeted objects 
25 around inside the target volume rather than having to learn different menu systems for 
you each software package. Thus a child could quickly learn how to create 3D CAD 
designs by moving targeted toy doors 361, windows 362, drills 360, and pencils. With 
the use of macros found in most systems today, a user would create a hole in an object 
the same way on different CAD systems by moving say a tool such as a drill starting at 
30 the proper location and orientation and proceed to the proper depth. 
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An example of a Quant that could be used to define command in a CAD or drawing 
system to create a rectangle might be proceeded as follows: 

Hit the Q key on the keyboard to start recording a Quant. 

5 

Sweep the target to the right punctuated with a short stationary pause. During the pause 
analyze the vector direction for the start of the path segment initiated with the Q key and 
ending with the pause. The first and last point of this segment define a vector direction 
that is mainly to the right with no significant up/down or in/out component. Identify this a 
10 direction 1. 

Sweep the target upward punctuated with a short stationary pause. During the pause 
analyze the vector direction for the start of the path segment initiated with the last pause 
and ending with the next pause. The first and last point of this segment define a vector 
15 direction that is mainly upward with no significant left/right or in/out component. Identify 
this a direction 2. 

Sweep the target to the left punctuated with a short stationary pause. During the pause 
analyze the vector direction for the start of the path segment initiated with the last pause 
20 and ending with the next pause. The first a last point of this segment define a vector 
direction that is mainly to the left with no significant up/down or in/out component. 
Identify this a direction 3. 

Sweep the target down punctuated with a short stationary pause. During the pause 
25 analyze the vector direction for the start of the path segment initiated with the last pause 
and ending with the next pause. The first and last point of this segment define a vector 
direction that is mainly down with no significant left/right or in/out component. Identify 
this a direction 4. 

30 End the Quant acquisition with a key press "a" that gives additional information to 
identify how the Quant is to be used. 
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In this example the Quant might be stored as a compact set of 7 numbers and letters (4, 
1, 2, 3, 4, a, 27) where 4 is the number of path segments, 1-4 are number that identify 
path segment directions (i.e. right, up, left, down), "a" is the member interrupt (the key 
5 press a), and 27 is the target ID. Figure 7a illustrates a flow chart as to how target paths 
and Quants can be defined. 

Figure 4 

10 Figure 4 illustrates a car driving game embodiment of the invention, which in addition 
illustrates the use of target-based artifacts and simplified head tracking with viewpoint 
rotation. The car dash is for example a plastic model purchased or constructed to 
simulate a real car dash, or can even be a make-believe dash (i.e. in which the dash is 
made from for example a board, and the steering wheel from a wheel from a wagon or 

15 other toy, - or even a dish), and the car is simulated in its actions via computer imagery 
and sounds. 

Cameras 405 and 406 forming a stereo pair, and light sources as required (not shown) 
are desirably mounted on rear projection TV 409, and are used together with computer 

20 41 1 to determine the location and orientation of the head of a child or other game 
player. The computer, provides from software a view on the screen of TV 409 (and 
optionally sound, on speakers 413 and 414) that the player would see as he turns his 
head - e.g. right, left, (and optionally, up,down- not so important in a car game driven 
on horizontal plane, but important in other games which can be played with the same 

25 equipment but different programs). This viewpoint rotation is provided using the 

cameras to determine the orientation of the head from one or more targets 415 attached 
to the players head or in this case, a hat 416. 

In addition, there desirably is also target 420 on the steering wheel which can be seen 
30 by stereo pair of cameras 405 and 406. As the wheel is turned, the target moves in a 
rotary motion which can be transduced accordingly, or as a compound x and y motion 
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by the camera processor system means in computer 41 1 . It is noted that The target 420 
can alternately be attached to any object that we chose to act as a steering wheel 421 
such as the wheel of a child's play dashboard toy 425. 

5 A prefabricated plywood or plastic molded for dash board can be supplied having other 
controls incorporated, e.g. gas pedal 440 hinged at bottom with hinge 441, and 
preferably providing an elastic tactile feedback, has target 445 viewed by cameras 405 
and 406 such that y axis position and/or z axis(range ) changes as the player pushes 
down on the pedal. This change is sensed, and determined by TV based stereo 
10 photogrammetry using the cameras and computer, which data is then converted by 
computer 412 into information which can be used to modify the display or audio signals 
providing simulations of the cars acceleration or speed depicted with visual and auditory 
cues. 

15 Similarly, a brake pedal or any other control action can be provided, for example moving 
a dashboard lever such as 450 sideways (moving in this case a target on its rear facing 
the camera not shown for clarity, in x axis motion), or turning a dashboard knob such as 
455(rotating a target, not shown, on its rear facing the camera). 

20 Alternatively to purchasing or fabricating a realistic dashboard simulation toy, the child 
can use his imagination with the same game software. Ordinary household objects such 
as salt shakers with attached targets can serve as the gas pedal, gearshift, or other 
controls. A dish with a target, for example can created by the invention to represent a 
steering wheel, without any other equipment used. This makes fun toys and games 

25 available at low cost once computers and camera systems become standard due to 
their applicability to a wide variety of applications, at ever lower hardware cost due to 
declining chip prices. 

One camera system (single or stereo pair or other ) can be used to follow all of the 
30 targets at once or several camera systems can follow separate targets. 
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To summarize this figure we have shown the following ideas: 



1) This invention can turn toys or household objects into computer controls or game 
controls. This is most easily accomplished by attaching one or more special targets 

5 to them, though natural features of some objects can be used. 

2) This invention allows us to set up control panels or instrument panels as required 
without the complex mechanical and electrical connections, and transducers that are 
typically required. This lowers the cost and complexity dramatically. 

3) The invention allows simplified head tracking with viewpoint rotation. 

10 

Some further detail on the embodiment of fig 4, wherein a boy is seated in front of a low 
cost plastic or plywood dashboard to which a targeted steering wheel and gas and 
brake pedal is attached (also gear shifts, and other accessories as desired). A target on 
the boys hat is observed, as are the targets on the individual items of the dash, in this 
15 case by stereo pair of cameras located atop the TV display screen, which is of large 
enough size to seem real-for example, the dash board width is preferable. Retro- 
reflective tape targets of scotch light 7615 material are used, illuminated by light 
sources in close adjacency to each camera. 

20 Optionally a TV image of the boy's face can also be taken to show him at the wheel, 
leaning out the window (likely imaginary)etc. 

As noted previously, the boy can move his head from left to right and the computer 
change the display so he sees a different view of his car on the track, and up and down, 
25 to move from driver view of the road, to overhead view of the course, say. 

Stereo cameras may be advantageously located on a television receiver looking 
outward at the back of an instrument panel, having targeted levers and switches and 
steering wheel, etc. whose movement and position is determined along with that of the 
30 player, if desired. The panel can be made out of low cost wood or plastic pieces. The 
player can wear a hat with targets viewed-same field of view as ins. Panel-this allows all 
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data in one view. As he moves his head to lean out the car window so to speak, the 
image on screen moves view (typically in an exaggerated manner, like a small angular 
head movement, might rotate the view 45 degrees in the horizontal or vertical direction 
on the screen). 

5 

This invention allows one to change the game from cars to planes just by changing the 
low cost plastic or wood molded toy instrument panel with its dummy levers, switches, 
sliders, wheels, etc. These actuating devices are as noted desirably for easiest results, 
targeted for example by high visibility and of accurately determinable position, 
10 retroreflector or led targets. The display used can be that of the TV, or separately 
incorporated (and preferably removable for use in other applications), as with an LCD 
(liquid crystal display) on the instrument panel. Multi-person play is possible, and can be 
connected remotely. 

15 Of significance, is that all datum's useable in this toy car driving simulation game, 
including several different driver body point inputs, head position and orientation, 
steering wheel position, plus driver gray level image and perhaps other functions as 
well, can all be observed with the same camera or multi-camera stereo camera set. This 
is a huge saving in cost of various equipment otherwise used with high priced arcade 

20 systems to deliver a fraction of the sensory input capability. The stereo TV image can 
also TV images which can be displayed in stereo at another site if desired too. 

Where only a single camera is used to see a single point, depth information in z (from 
panel to camera, here on the TV set as shown in fig 4) is not generally possible. Thus 
25 steering wheel rotation is visible as an xy movement in the image field of the camera, 
but the gas pedal lever must be for example hinged so as to cause a significant x and/or 
y change not just a predominantly z change. 

A change in x and/or y can be taught to the system to represent the range of gas pedal 
30 positions, by first engaging in a teach mode where one can as shown in fig 4 input a 
voice command to say to the system that a given position is gas pedal up, gas pedal 
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down (max throttle) and any position in between. The corresponding image positions of 
the target on the gas pedal lever member re recorded in a table and looked up ( or 
alternatively converted to an equation) when the game is in actual operation so that the 
gas pedal input command can be used to cause imagery on the screen (and audio of 
5 the engine, say)to give an apparent speedup or slowing down of the vehicle. Similarly 
the wheel can be turned right to left, with similar results, and the brake pedal lever and 
any other control desired can also be so engaged, (as noted below, in some cases such 
control is not just limited to toys and simulations and can also be used for real vehicles). 

10 The position, velocity, and rate of change of targeted member positions can also be 
determined, to indicate other desirable information to the computer analyzing the TV 
images. 

Where stereo image pairs are used, the largest freedom for action results as z 
15 dimension can also be encoded. However many control functions are unidirectional, and 
thus can be dealt with as noted above using a single camera 2D image analysis. 

On a broader scale, this aspect of the invention allows one to create 3D physical 
manifestations of instruments in a simulation form, much as National Instruments firm 
20 has pioneered two dimensional TV screen only displays. In addition such an "instrument 
panel" can also be used to interact with conventional programs-even word processing, 
spreadsheets and the like where a lever moved by the user might shift a display window 
on the screen for example. A selector switch on the panel can shift to different screens 
altogether, and so forth. 

25 

Figure 4 has also illustrated the use of the invention to create a simple general-purpose 
visual and tactile interface to computer programs. 

Figure 5 

30 
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Figure 5a illustrates a one-person game where a targeted airplane model 505 can be 
used to define the course of an airplane in a game. The orientation of the plane, 
determined from targets 510, 511, and 512 (on the wings and fuselage respectively) by 
camera(s) 530 is used by program resident in computer 535 to determine its position 

5 and orientation, and changes therein due to movement in the game. The model can be 
purchased pre targeted (where natural features such as colored circles or special 
retroreflectors might be used for example). The planes position and/ orientation or 
change therein is used as an input to a visual display on the computer display and audio 
program to provide realistic feeling of flight- or alternatively to allow the computer to 

10 stage a duel, wherein an the opposing fighter is created in the computer and displayed 
either alone, or along with the fighter represented by the player. It is particularly 
enhanced when a large screen display is used, for example >42 inches diagonal. 

A two person version in shown in figure 5b where the two computers can be linked over 
15 the internet or via a cable across the room. In the two-person game airplane 510 is 
targeted 51 1 and the motion is sent over a communication link 515 to a second 
computer where another player had her airplane 520 with its target. The two results can 
be displayed on each computer display allowing the users to interactively modify their 
position and orientation. An interrupt member can trigger the game to fire a weapon or 
20 reconfigure the vehicle. A set of targets 514 can even be attached (e.g. with Velcro, to 
his hands or wrists, and body or head) to the player 513 allowing her to "become" the 
airplane as he moves around in the front of the cameras. This is similar to a child today, 
pretending to be an airplane, with arms outstretched. It is thus a very natural type of 
play, but with exciting additions of sounds and 3D graphics to correspond to the moves 
25 made. 

For example, 

• if the child's arms tilt, to simulate a bank of the plane, a plane representation such as 
30 an F16 on the screen can also bank. 

• If the child moves quickly, the sounds of the jet engine can roar. 
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• If the child moves his fingers, for example, the guns can fire. 

And so forth. In each case a position or movement of the child, is sensed by the 
camera, compared by the computer program to programmed or taught movement or 
5 position, and the result used to activate the desired video and/or audio response - and 
to transmit to a remote location if desired the positions and movements either raw, or in 
processed mode (i.e. a command saying "bank left' could just be transmitted, rather 
than target positions corresponding thereto). 

10 Also illustrated in figure 5c is a one or multi-person "Big Bird" or other hand puppet 
game embodiment of the invention played if desired over remote means such as the 
Internet. It is similar to the stuffed animal application described above, except that the 
players are not in the same room. And, in the case of the Internet, play is bandwidth 
limited, at least today. 

15 

Child 530 plays with doll or hand puppet 550, for example Sesame Streets' "Big Bird", 
can be targeted using targets 535 and 540 on its hands 551 and 552 and curvilinear line 
type target 553 and 554 outlining its upper and lower lips (beak). Target motion sensed 
by stereo pair of cameras 540 and 541 is transformed by computer 545 into signals to 
20 be sent over the internet 555 or through another communication link to allow a second 
child 556 to interact, moving his doll 560 with say at least one target 561 . 

In the simplest case, Each user controls one character. The results of both actions can 
be viewed on each computer display. 

25 

It is noted that a simple program change, can convert from an airplane fighter game, to 
something else- for example pretending to be a model on a runway, (where walking 
perfectly might be the goal), or dolls that could be moved in a TV screen representation 
doll house- itself selectable as the White House, Buckingham Palace or what ever. 

30 
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We have depicted a one or two person airplane game according to the invention, to 
further include inputs for triggering and scene change via movement sequences or 
gestures of a player. Further described are other movements such as gripping or touch 
indicating which can be useful as input to a computer system. 

5 

The invention comprehends a full suite of up to 6 degrees of freedom gesture type 
inputs, both static, dynamic, and sequences of dynamic movements. 

Figure 6 

10 

Figure 6 illustrates other movements such as gripping or touch indicating which can be 

useful as input to a computer system. Parts of the user, such as the 

hands can describe motion or position signatures and sequences of considerable utility. 

15 Some natural actions of this type (learned in the course of life):Grip, pinch, grasp, 
stretch, bend, twist, rotate, screw, point, hammer, throw. 

Some specially learned or created actions of this type: define parameter, (for example, 
fingers wide apart, or spaced narrow) flipped up targets etc on fingers - rings, simple 
20 actuated object with levers to move targets. 

This really is a method of signaling action to computer using Detected position of one 
finger, two fingers of one hand, one finger of each hand, two hands, or relative 
motion/position of any of the above with respect to the human or the computer camera 
25 system or the screen (itself generally fixed with respect to the camera system). 

<> 

These actions can cause objects depicted on a screen to be acted on, by sensing using 
the invention. For example, consider the thumb 601 and first finger 602 of lets say the 
users left hand 605 are near an object such as a 3D graphic rendition of a cow 610 
30 displayed on the screen, 61 5, in this case hung from a wall, or with an image projected 
from behind thereon.. As the fingers are converged in a pinching motion depicted as 
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dotted lines 620, the program of computer 630 recognizes this motion of fingernails 635 
and 636 seen by cameras 640 and 641 connected to the computer which processes 
their image, as a pinch/grasp motion and can either cause the image of the cow to be 
compressed graphically, or if the hand is pulled away with in a certain time, it is a 
5 interpreted to be a grasp, and the cow object is moved to a new location on the screen 
where the user deposits it, for example at position 650 (dotted lines). Or it could be 
placed "in the trash". 

A microphone 655 can be used to input voice commands into the computer 630 which 
10 can then using known technology (dragon software, IBM via voice, etc) be used to 
process the command. A typical command might be grip, move, etc, if these weren't 
obvious from the detected motion itself. 

In a similar manner, speakers 660 controlled by the computer can give back data to the 
15 user such as a beep when the object has been grasped. Where possible for natural 
effect, it is desirable that where sound and action coincide - that is a squishing sound 
when something is squished, for example. 

If two hands are used, one can pinch the cow image at each end, and "elongate it " in 
20 one direction, or bend it in a curve, both motions of which can be sensed by the 

invention in 3 dimensions- even though the image itself is actually represented on the 
screen in two dimensions as a rendered graphic responding to the input desired, (via 
action of the program). 

25 The Scale of grip of fingers depends on range from screen (and object thereon being 
gripped) desirably has a variable scale factor dependent on detected range from the 
sensor (unless one is to always touch the screen or come very near it to make the 
move). 

30 Pinching or Gripping is very useful in combination with voice for word processing and 
spreadsheets. One can move blocks of data from one place to another in a document, 
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or from one document to the next. One can very nicely use it for graphics and other 
construction by gripping objects, and pasting them together, and then rotating them or 
whatever with the finger motions used sensed by the invention. 

Similarly to the pinching or grasping motion just described, some other examples which 
can also be sensed and acted on with the invention, using either the natural image of 
the fingers or hands, or of specialized datums thereon, are: 

• Point 

• Move 

• Slide 

• grip 

• Pull apart, stretch, elongate 

• Push together, squeeze 

• Twist, screw, turn 

• Hammer 

• Bend 

• Throw 



Figure 7 (block diagram ) 

20 Figure 7 illustrates the use of this invention to implement an 

optical based computer input for specifying software 

program commands, parameters, define new objects or new actions in an application 

computer program, temporarily redefine some or all of the database associated with the 

target or call specific computer programs, functions, or subroutines. 

A sequence of simple path segments of the targets obtained by this invention separated 
by "Quant punctuation" together with its interrupt member settings and its target ID can 
define a unique data set. We refer to this data set as a "Quant" referring to the discrete 
states (much like quantum states of the atom). The end of each path segment is 
denoted with a "Quant punctuation" such as radical change in path direction or target 
orientation or speed or the change in a specific interrupt member or even a combination 
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of the above. The path segments are used to define a reduced or quantized set of target 
path information. 



A Quant has an associated ID (identification number) which can be used as a look-up 
5 key in an associated database to find the associated program commands, parameters, 
objects, actions, etc. as well as the defining characteristics of the Quant. 

An example of a Quant that could be used to define command in a CAD or drawing 
system to create a rectangle might be proceeded as follows: 

10 

A. Hit the Q key on the keyboard to start recording a Quant. 

B. Sweep the target to the right punctuated with a short stationary pause. During the 
pause analyze the vector direction for the start of the path segment initiated with the 
Q key and ending with the pause. The first and last point of this segment define a 

15 vector direction that is mainly to the right with no significant up/down or in/out 
component. Identify this a direction 1. 

C. Sweep the target upward punctuated with a short stationary pause. During the 
pause analyze the vector direction for the start of the path segment initiated with the 
last pause and ending with the next pause. The first and last point of this segment 

20 define a vector direction that is mainly upward with no significant left/right or in/out 
component. Identify this a direction 2. 

D. Sweep the target to the left punctuated with a short stationary pause. During the 
pause analyze the vector direction for the start of the path segment initiated with the 
last pause and ending with the next pause. The first a last point of this segment 

25 define a vector direction that is mainly to the left with no significant up/down or in/out 
component. Identify this a direction 3. 

E. Sweep the target down punctuated with a short stationary pause. During the pause 
analyze the vector direction for the start of the path segment initiated with the last 
pause and ending with the next pause. The first and last point of this segment define 

30 a vector direction that is mainly down with no significant left/right or in/out 
component. Identify this a direction 4. 
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F. End the Quant acquisition with a key press "a" that gives additional information to 
identify how the Quant is to be used. 

G. In this example the Quant might be stored as a compact set of 7 numbers and letters 
(4, 1 , 2, 3, 4, a, 27) where 4 is the number of path segments, 1-4 are number that 

5 identify path segment directions (i.e. right, up, left, down), "a" is the member interrupt 
(the key press a), and 27 is the target ID. Figure 7a illustrates a flow chart as to how 
target paths and Quants can be defined. 

H. In another example, the continuous circular sweep rather than punctuated segments 
might define a circle command in a CAD system. Some Quants might immediately 

1 0 initiate the recording of another Quant that provides the information needed to 
complete the prior Quant instruction. 

I. Specific Quants can identify a bolt and its specific size, and thread parameters 
together with information as to command a computer controlled screwing device or 
drilling a hole for this size bolt. Another Quant could identify a hinge and; 

1 5 J. Define a CAD model with the specific size, and manufacture characteristics defined 
by Quant. 

K. Or assign joint characteristics to a CAD model. 

L. Or command a computer controlled device to bend an object at a given location and 
orientation by a given location and orientation amount. 
20 M. This method can be applied to sculpture where the depth of a planar cut or the 
whittling of an object can be determined by the characteristics of the targeted 
object's path (in other words by it's Quant). 

Figure 8 

25 

Figure 8 illustrates the use of this invention for medical applications. A user can apply 
this invention for teaching medical and dental students, or controlling robotic equipment 
used for example in medical and dental applications. In addition, it can be used to give 
physically controlled lookup of databases and help systems. 
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In figure 8a, somewhat similar to fig 1 above, a scalpel has two targets 801, and 802 (in 
this case triangular targets) allowing a 6 degree of freedom solution of the position and 
orientation of a scalpel 81 1 to which it is attached, having a tip 815. Other surgical 
instruments can also be used, each with their own unique targets and target ID's, if 
5 desired, to allow their automatic recognition by the electro-optical sensing system of the 
invention. 

The figure shows a medical student's hand 820 holding a model of a surgical 
instrument, a scalpel. A model of a body can be used to call up surgical database 

10 information in the computer attached to the camera system about the body parts in the 
vicinity of the body model 825 being touched. If the targeted tool is pressed down 
compressing the spring 810 and moving the targets 801 and 802 apart, the information 
displayed can refer to internal body parts. As the user presses down harder on the 
spring, the greater the targets move apart the lower in the body and this can be used to 

15 instruct the database to display the computer that we reach for information. If the user 
wants to look up information on drugs that are useful for organs in a given region in the 
body he might use a similar model syringe with a different target having a different ID. In 
a similar way a medical (or dental) student could be tested on his knowledge of 
medicine by using the same method to identify and record in the computer location on 

20 the body that is the answer to a test question. Similarly the location and orientation of 
the targeted tool can be used to control the path of a robotic surgery tool. 

Notice that the tool with a spring gives the user tactile feedback. Another way the user 
can get tactile feedback is to use this pointer tool on a pre-calibrated material that has 
25 the same degree of compression or cutting characteristics as the real body part. 

In a preferred embodiment, each surgical device has its own unique target and its own 
unique target ID. One of the unique features of this invention is that the user can use 
the fact surgical tool that he uses normally in the application of his art. Thus, a dental 
30 student can pick up a standard dental drill and the target can be attached to a dental 
drill that has the same feel as an ordinary drill. 
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Figure 8b show how several objects can be attached to specialized holders that are 
then attached to a baseboard to create a single rigid collection whose location and 
orientation can be pre-registered and stored in a computer database such that only a 
5 single targeted pointer or tool need be tracked. The baseboard has one or more 
specialized target attachment locations. We consider two types of baseboard/holder 
attachments, fixed (such as pegboard/hole) or freeform (using for example magnets or 
Velcro). Charts 8d and 8e describe how these might be calibrated. 

10 Attachable targets can be used to pre-register the location and orientation of 1 or more 
objects relative to a camera system and to each other using a baseboard 839 shown 
here with square pegs 837 and an attachment fixture 838 that will hold a specialized 
target such as those shown as 855, 856, 857, A set of objects here shown as a model 
of a body 840 and a model of a heart 841 with attachment points 842 and 843 that are 

15 attached to object holders 845 and 846 at attachment points 847 and 848. The object 
holders can be of different shapes allowing the user to hold the object at different 
orientations and positions as desired. Each object holder has an attachment fixture 850 
and 851 that will hold a specialized target. The user then picks the appropriate target 
together with the appropriate fixture on the object holder so that the target is best 

20 positioned in front of the camera to capture the location and orientation of the target. 
Chart 8d and 8e describe the calibration process for a fixed and freeform attachment 
implementation respectively. Once the baseboard and targets have been calibrated, a 
computer program can identify which object is being operated on and determine how 
this information will be used. The steps for utilizing this system is described in Chart 8f. 

25 

Figure 8c illustrates a dentist with a targeted drill and a target attached to a patients 
teeth can have the computer linked to the camera system perform an emergency pull 
back of the drill if a patient sneezes. 

30 Many other medically related uses may be made of the invention. For example, 

movement or position of person a person may be sensed, and used to activate music or 
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3D stimulus. This has suspected therapeutic value when combined with music therapy 
in the treatment of stroke victims and psychiatric disorders. 



Similarly, the output of the sensed condition such as hand or feet position, can be used 
5 to control actuators linked to therapeutic computer programs, or simply for use in health 
club exercise machines. Aids to the disabled are also possible. 

FIGURE 9 

10 Figure 9 illustrates a means for aiding the movement of persons hands while using the 
invention in multiple degree of freedom movement. 

A joy stick is often used for game control. Shown in fig. 9a is a joystick 905 of the 
invention having and end including a ball, 910, in which the data from datums on the 
15 ball position at the end of the stick is taken optically by the video camera 915 in up to 6 
axes using a square retroreflective target 920 on the ball. The stick of this embodiment 
itself, unlike other joysticks is provided not as a transduction device, but to support the 
user. Alternatively some axes can be transduced, e.g.. with LVDTS or resolvers, while 
data in other axes is optically sensed using the invention. 

20 

When one wishes to assemble objects, one object may be is held in each hand, or one 
can use two joysticks as above, or one stick aide as shown here, one hand free., for 
example. 

25 Figure 9b shows an alternate to a joystick, using retroreflective material targets attached 
to fingers 930,931 and 932 resting on a floating pad 935 resting on a liquid 940 in a 
container 945. The floating pad gives comfortable support to the hand while freely 
allowing the targeted hand to move and rotate. We believe that this invention will help 
reduce the incidence of Carpal Tunnel syndrome. 
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Figure 9c shows another more natural way to use this invention in a way that would 
eliminate Carpal Tunnel syndrome. One merely lets the targeted hand 960 hang down 
in front of a camera system 970, also illustrated in the context of an armrest in fig 10. 



5 Figure 10 

Figure 10 illustrates a natural manner of computer interaction for aiding the movement 
of persons hands while using the invention in multiple degree of freedom movement 
with ones arms resting on a armrest of a chair, car, or the like. 

10 

As shown, user 1005 sitting in chair 1010 has his thumb and two fingers on both 
hands1011 and 1012 targeted with ring shaped retroreflector bands 1015-1020 as 
shown. All of the datums are seen with stereo TV camera pair 1030 and 1031 on top of 
display 1035 driven by computer 1040 which also processes the TV camera images. 
15 Alternatively, one hand can hold an object, and the user can switch objects as desired, 
in one or both of his hands, to suit the use desired, as has been pointed out elsewhere 
in this application. 

We have found that this position is useful for ease of working with computers. In 
20 particular when combined with microphone 1050 to provide voice inputs as well which 
can be used for word processing and general command augmentation. 

This type of seated position is highly useful for inputs to computers associated with 

• CAD stations 
25 • Cars 

• Games 

• Business applications 

to name a few. Its noted that the armrest itself may contain other transducers to further 
be used in conjunction with the invention, such as force sensors and the like. 

30 

Figure 1 1 
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This figure illustrates an embodiment wherein other variable functions in addition to 
image data of scene or targets are utilized. As disclosed, such added variables can be 
via separate transducers interfaced to the computer or desirably provided by the 
5 invention in a manner to coexist with the existing TV camera pickups used for position 
and orientation input. 

A particular illustration of a level vial in a camera field of view illustrates as well the 
establishment of a coordinate system reference for the overall 3-6 degree of freedom 

10 coordinate system of camera(s). As shown level vial 1 101 located on the object 1 102 is 
imaged by single camera 1 140 along with the object, in this case having a set of 3 retro- 
reflective targets 1105-1107, and a retro-reflector 1120behind the level vial to aid in 
return in light from near co-axial light source 1 130 therefrom (and particularly the 
meniscus 1125) to camera 1140, used both for single camera photogrammetry to 

15 determine object position and orientation, but as well to determine the level in one or 
two planes of the object with respect to earth. 

It is noted that the level measuring device such as a vial, inclinometer, or other device 
can also be attached to the camera and with suitable close-up optics incorporated 
20 therewith to allow it to be viewed in addition to the scene. In this case the camera 
pointing direction is known with respect to earth or whatever is used to zero the level 
information which can be very desirable. 

Clearly other variables such as identification, pressure, load, temperature, etc. can also 
25 be so acquired by the cameras of the invention along with the image data relating to the 
scene or position of objects. For example the camera can see a target on a bimorph 
responsive to temperature, or it could see the natural image of mercury in a manometer. 

Figure 12 

30 
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This figure illustrates a touch screen constructed according to the invention employing 
target inputs from fingers or other objects in contact with the screen, either of the 
conventional CRT variety, or an LCD screen, or a projection screen - or virtual contact 
of an aerial projection in space. 

5 

As shown, a user 1201 with targeted finger 1203, whose position in 3D space relative to 
TV screen 1205 (or alternatively absolute position in room space) is observed by 
camera system 1210 comprising a stereo pair of cameras (and if required light sources) 
as shown above. When the user places the target 1202 on his finger 1203 in the field of 

10 view of the cameras, the finger target is sensed, and as range detected by the system 
decreases indicating a touch is likely, the sensor system begins reading continuously 
(alternatively, it could read all the time, but this uses more computer time when not in 
use). When the sensed finger point reaches a position, such as "P" on the screen, or in 
a plane or other surface spaced ahead a distance Z from the screen defined as the 

15 trigger plane, the system reads the xy location, in the xy plane of the screen, for 
example. 

Alternatively a transformation can be done to create artificial planes, curved surfaces or 
the like used for such triggering as well. 

20 

Target datum's on the screen, either retro-reflectors or LED's say at the extremities, or 
projected on to the screen by electron guns or other light projection devices of the TV 
system can be used to indicate to, or calibrate the stereo camera system of the 
invention to the datum points of interest on the screen. 

25 

For example calibration datum's 1221-1224 are shown projected on the screen either in 
a calibration mode or continuously for use by the stereo camera system which can for 
example search for their particular color and/or shape. These could be projected for a 
very short time (e.g. one 60hz TV field), and synched to the camera, such that the 
30 update in calibration of the camera to the screen might seem invisible to the user. 
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A specially targeted or natural finger can be used with the invention, or an object both 
natural (e.g. a pencil point) or targeted (a pencil with a retroreflector near its tip, for 
example, ) can be used. In general, the natural case is not as able to specifically define 
a point however, due to machine vision problems in defining its position using limited 
5 numbers of pixels often available in low cost cameras. The retrorref lector or LED target 
example is also much faster, due to light power available to the camera system, and the 
simplicity of solution of its centroid for example. 

This is an important embodiment, as it allows one to draw, finger painting, or otherwise 
10 write on screens of any type, including large screen projection TV's - especially rear 
projection, where the drawing doesn't obscure the video projection. 

Even when front projection onto a screen is used, one can still draw, using for example 
a video blanking to only project the screen image where not obscured if desired. The 
15 cameras incidentally for viewing the targeted finger or paintbrush, or whatever is used to 
make the indication can be located even behind the screen, viewing through the screen 
at the target (this assumes the screen is sufficiently transparent and non-distorting to 
allow this to occur). 

20 It is noted that the screen may itself provide tactile feel. For example, one can remove 
material from a screen on which imagery is projected. This could for example be a clay 
screen, with a front projection source. The object removing the material could be a 
targeted finger or other object such as a sculpture tool. As discussed previously, the 
actual removal of material could be only simulated, given a deformable screen feel, or 

25 with no feel at all, if the screen were rigid. 

It is also of interest that the object on which the projection is displayed, need not be flat 
like a screen, but could be curved to better represent o conform to the object shape 
represented or for other purposes. 

30 
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The embodiment of the invention of fig 12 can be further used for computer aided 
design particularly with large screens which can give life size images, and for use with 
life size tools and finger motion. The use of inputs herein described, as with respect to 
the figure above, is expected to revolutionize computer aided design and related fields 
5 in the sense of making computer use far more intuitive and able to be used effectively 
by populace as a whole. 

It is extremely interesting to consider a CAD display in life size or at least large size 
form. In this case, the user experience is. much improved over that today and is quicker 
10 to the desired result due to the much more realistic experience. Illustrated this are 
applications to cars and clothes design. 

For example, consider the view from the bottom of an underbody of a car with all its 
equipment such as cables pipes and other components on a life size projection TV 
15 image 1260, obtainable today at high definition with digital video projectors, especially if 
one only worked with half the length of the car at once. Using the invention, a designer 
1200 can walk up to the screen image (2 dimensionally displayed, or if desired in 
stereoscopic 3D), and trace, with his finger 1203, the path where the complex contoured 
exhaust pipe should go, a notorious problem to design. 

20 

The computer 1240 taking the data from stereo pair of TV cameras 1210, can cause the 
TV screen to display the car undercarriage life size, or if desired to some other scale. 
The designer can look for interferences and other problems as if it were real, and can 
even take a real physical part if desired, such as a pipe or a muffler, and lay it life size 
25 against the screen where it might go, and move the other components around " 

physically" with his hand, using his hand or finger tracked by the TV camera or cameras 
of the system as input to the corresponding modification to the computer generated 
image projected. 

30 Multiple screens having different images can be displayed as well by the projector, with 
the other screens for example showing section cuts of different sections of the vehicle 
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which can further indicate to the designer the situation, viewed from different directions, 
or at different magnifications, for example. With the same finger, or his other hand the 
designer can literally "cut" the section himself, with the computer following suit with the 
projected drawing image, changing the view accordingly. 

5 

The invention has the ability to focus ones thoughts to a set of motions - fast, intuitive 
and able to quickly and physically relate to the object at hand. It is felt by the inventors 
that this will materially increase productivity of computer use, and dramatically increase 
the ability of the computer to be used by the very young and old. 

10 

As noted above in the car design example, individual engineers using targeted hands 
and fingers (or natural features such as finger tips) or by use of targeted aides or tools 
as described, they can move literally the exhaust pipe by grabbing it using the invention 
on the screen and bending it, i.e. causing a suitable computer software program in real 
15 time to modify the exhaust pipe data base to the new positions and display same on the 
projected display (likely wall size). 

If no database existed, a drawing tool can be grabbed, and the engineer can "draw" 
using his targeted and sensed by the TV camera or other sensor of the invention finger 
20 or tool on the screen where he wants the exhaust pipe to go. The computer then 

creates a logical routing and the necessary dimensions of the pipe, using manufacturing 
data as need be to insure it could be reliably made in economically manner (if not, an 
indication could be provided to the engineer, with hints as to what is needed). 

25 One of the very beauties of this is that it is near real, and it is something that a group of 
more than one person can interact with.This gives a whole new meaning to design 
functions that have historically been solo in front of a "tube". 

For best function the screen should be a high definition TV (HDTV) such that a user 
30 looking on side sees good detail and can walk over to another side and also see good 
detail. 
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Following figure 13, another useful big screen design application in full size is to design 
a dress on a model. The use of the big screen, allows multiple people to interact easily 
with the task, and allows a person to grip portion of the prototype dress on the screen, 
5 and move it elsewhere (in this case finger tips as targets would be useful). It also allows 
normal dress tools to be used such as targeted knife or scissors. 

Figure 13 

10 Illustrated is clothing design using finger touch and targeted material. The invention is 
useful in this application both as a multi-degree of freedom input aide to CAD as 
disclosed elsewhere herein, and for the very real requirement to establish the 
parameters of a particular subject (a customer, or representative "average" customer, 
typically) or to finalize a particular style prototype. 

15 

A particular example is herein shown with respect to design of women's dresses, 
lingerie and the like, where the fit around the breasts is particularly difficult to achieve. 
As shown, the invention can be employed in several ways. 

20 First, the object, in this case a human or manikin, with or without clothes, can be 
digitized, for the purpose of planning initial cutting or sewing of the material. This is 
accomplished using the invention using a simple laser pointer. It is believed that some 
similar ideas have been developed elsewhere, using projection grids, light stripes or the 
like. However, the digitization of the object can be accomplished at very low cost as 

25 described below using the multicamera stereo vision embodiment of the invention. 

Secondly, the cloth itself can be targeted, and the multicamera stereo acquired target 
data before tryout and/or the distorted data (such as position, location or shape) after 
tryout determined, and modifications made, using this data to assist in modifying the 
30 instant material or subsequent material desired. 
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Third, one can use the ability of the invention to contour and designate action on objects 
in real time to advantage. For example, consider fashion model 1301 wearing dress 
1302 that let us say doesn't fit quite right in the breast area 1303. To help fix this 
problem, she (or someone else, alternatively) can, using her targeted finger 1310, rub 
5 her finger on the material where she wishes to instruct the computer 1 31 5, connected to 
stereo camera 1316 (including light sources as required), either of her own shape 
(which could also have been done without clothes on) relative to the shape of the 
material on her, or, the shape - or lack of shape - she thinks it should be (the lack of 
shape, illustrated for example to be solved by eliminating a fold, or crease, or bunching 
10 up of the dress material). Data from multiple sequential points can be taken as she rubs 
her finger over herself, obtaining her finger coordinates via the invention and digitizing 
the shape in the area in question along the path traveled. 

Such instruction to the computer can for example be by voice recording (for later 
15 analysis, for example) or even instant automatic voice recognition. In addition, or 
alternatively, it can be via some movement such as a hand movement indication she 
makes which can carry pre-stored and user programmable or teachable meaning to the 
computer (described also in fig. 7 above and elsewhere herein). For example moving 
her finger 1310 up and down in the air, may be sensed by the camera and discerned as 
20 a signal of letting out material vertically. A horizontal wave, would be to do it 

horizontally. Alternatively she might hold an object with a target on her other hand, and 
use it provide a meaning. As further disclosed in fig 6, she can make other movements 
which can be of use as well. By pinching her fingers, which could be targeted for ease 
of viewing and recognition, she could indicate taking up material (note she can even 
25 pinch the material of a prototype dress just as she would in real life). 

It is noted that the model could alternatively point a laser pointer such as 1320 with spot 
1321 at the point on herself needed, the 3D coordinates of the laser designated being 
determined by the stereo cameras imaging the laser spot. This too can be with a 
30 scanning motion of the laser to obtain multiple points. Other zones than round spots can 
be projected as well, such as lines formed with a cylinder lens. This allows a sequence 
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of data points to be obtained from a highly curved area without moving the laser, which 
can cause motion error. Alternatively, she could use a targeted object, such as a 
scissors or ruler to touch herself with, not just her finger, but this not as physically 
intuitive as ones own touch. 

5 

A microphone 1340 may be used to pick up the models voice instruction for the 
computer. Since instruction can be made by the actual model trying on the clothes, 
others need not be present. This saves labor to effect the design or modification input, 
and perhaps in some cases is less embarrassing. Such devices might then be used in 
10 clothing store dressing rooms, to instruct minor modifications to other wise ready to 
wear clothes desired for purchase. 

In many applications, a laser pointer can have other uses as well in conjunction with the 
invention. In another clothes related example, a designer can point at a portion of a 
15 model, or clothes on the model and the system can determine where the point falls in 
space, or relative to other points on the model or clothes on the model (within the ability 
of the model to hold still). Additionally, or alternatively, the pointer can also be used to 
indicate to the computer system what area is in need of work, say by voice, or by the 
simple act of pointing, with the camera system picking up the pointing indication. 

20 

It is also noted that the pointer can project a small grid pattern (crossed lines, dot grid, 
etc.) or a line or a grille (parallel lines) on the object to allow multiple points in a local 
area of the object o be digitized by the camera system. Such local data, say in a portion 
of the breast area, is often all that is needed for the designer. This is illustrated by 

25 pointer projector 1350 projecting a dot grid pattern of 5 x 5 or 25 equally spaced spots 
1355 (before distortion in the camera image caused by curvature of the object) on a 
portion of bra 1360, with the spot images picked up by the stereo cameras over not too 
curved areas is not too difficult. If the points cannot be machine matched in the two 
stereo camera images by the computer program, such matching can be done manually 

30 from a TV image of the zone. Note that different views can also be taken for example 
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with the model turning slightly which can aid matching of points observed. Or 
alternatively, added cameras from different directions can be used to acquire points. 

Note too the unique ability of the system to record in the computer or on a magnetic or 
5 other storage medium for example, a normal grayscale photographic image, as well as 
the triangulated spot image. This of considerable use, both in storing images of the 
fashion design (or lack thereof) as well as matching of stereo pairs and understanding 
of the fitting problem. 

10 Figure 14 

Figure 14 illustrates additional applications of alias objects such as those of figure 3, for 
purpose of planning visualization, building toys, and inputs in general. As shown, a user, 
in this case a child, 1401 , desires to build a building with his blocks, such as 1410-1412 

15 (only a few of his set illustrated for clarity). He begins to place his blocks in front of 
camera or cameras of the invention such as cameras 1420 and 1421 which obtain 
stereo pair of images of points on his blocks which may be easily identified such as 
corners, dot markings, such as those shown, (which might be on all sides of the blocks) 
etc, and desirably are retro-reflective or otherwise of high contrast. Rectangular colored 

20 targets on rectangular blocks is a pleasing combination. 

As he sequentially places his blocks to build his building, images of a building can be 
made to appear via software running in computer 1440, based on inputs from cameras 
1420 and 1421 shown here located on either side of TV screen 1430. These images 
25 such as 1450, can be in any state of construction, and can be any building, e.g. the 
Empire State building, or a computer generated model of a building. Or by changing 
software concerning the relevant images to be called up or generated, he could be 
building a ship, a rocket, or whatever. 

30 Similarly, such an arrangement of plurality of objects can be used for other purposes, 
such as for physical planning models in 3D as opposed to today's computer generated 
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PERT charts, Gant charts, and organization charts in 2D. Each physical object, such as 
the blocks above, can be coded with its function, which itself can be programmable or 
selectable by the user. For example, some blocks can be bigger or of different shape or 
other characteristic in the computer representation, even if in actuality they are the 
5 same or only slightly different for ease of use, or cost reasons, say. The target on the 
block can optically indicate to the computer what kind of block it is. 

Another application would be plant layout, where each individual block object could be a 
different machine, and could even be changed in software as to which machine was 
10 which, is. In addition, some blocks could for example, in the computer represent 
machine tools, others robots, and so on. 

Figure 15 

15 Figure 15 illustrates a sword play video game of the invention using one or more life- 
size projection screens. While large screens aren't needed to use the invention, the 
physical nature of the invention's input ability lends itself to same. 

As shown, player 1501 holds sword 1502 having 3 targets 1503-1505 whose position in 
20 space is imaged by stereo camera photogrammetry system (single or dual camera) 
1510, and retro-reflective IR illumination source 1511, so that the position and 
orientation of the sword can be computed by computer 1520 as discussed above. The 
display, produced by overhead projector 1525 connected to computer 1520 is a life size 
or near life size HDTV projection TV image 1500 directly in front of the player 1501 and 
25 immersing him in the game, more so than in conventional video games, as the image 
size is what one would expect in real life. 

Let us now consider further how this invention can be used for gaming. In many games 
it desired both to change the view of the player with aspect to the room or other location 
30 to look for aliens or what have you. This is typical of " kick and punch" type games but 
many other games are possible as well. Regardless, the viewpoint is easily adapted 
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here by tuning the head and targeting the head has been shown and described above, 
and in copending applications by Tim Pryor. 



This however begs an interesting question as to whether in turning the head, one is 
5 actually looking away from the game, if the game is on a small screen. This explains 
why a larger screen is perhaps desirable. But if one sits in front of a large screen, say 
40" diagonal or more, one may feel that a little joystick or mouse is much too small as 
the means to engage computer representations of the opponents. However, using this 
invention one can simply have a targeted finger or an object in one's hand that could be 
10 pointed for example. It is far more natural, especially with larger screens- which 
themselves give more lifelike representations. 

The whole game indeed may actually be on a human scale. With very large projection 
TV displays, the enemies or other interacting forces depicted on the screen can in fact 

15 be human size and can move around by virtue of the computer program control of the 
projection screen just the same as they would have in life. This however makes it 
important, and is indeed part of the fun of using the invention, to employ human size 
weapons that one might use including but not limited to one's own personally owned 
weapons- targeted according to the invention if desired for ease of determining their 

20 location. The opponents actions can be modeled in the computer to respond to those of 
the player detected with the invention. 

A two or more player game can also be created where each player is represented by a 
computer modeled image on the screen, and the two screen representations fight or 
25 otherwise interact based on data generated concerning each players positions or 
objects positions controlled or maneuvered by the players, the same stereo camera 
system can if desired, be used to see both players if in the same room. 

For example in the same, or alternatively in another game, the player 1549 may use a 
30 toy pistol 1550 which is also viewed by Stereo camera system, 1510 in a similar manner 
to effect a "shootout at the OK corral" game of the invention. In this case the players 
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hand 1575 or holster 1520 and pistol 1585 may be targeted with one or more targets as 
described in other embodiments and viewed by stereo camera (single or dual) system 
of the invention, as in the sword game above. On the screen in front of the player is a 
video display of the OK corral, (and/or other imagery related to the game) with "bad 
5 guys" such as represented by computer graphics generated image 1535, who may be 
caused by the computer game software to come in to view or leave the scene, or 
whatever. 

To play the game in one embodiment, the player draws his gun when a bad guy draws 
10 his and shoots. His pointing (i.e. shooting)accuracy and timing may be monitored by the 
target-based system of the invention that can determine the time at which his gun wias 
aimed, and where it was aimed(desirably using at least one or more targets or other 
features of his gun to determine pointing direction). This is compared in the computer 
1520 with the time taken by the bad guy drawing, to determine who was the winner- if 
15 desired, both in terms of time, and accuracy of aiming of the player. 

An added feature is the ability of a TV camera of the invention to take (using one of the 
cameras used for datum detection, or a separate camera such as 1580, a normal 2D 
color photograph or TV image 1588 of a player or other person 1586, and via computer 
20 software, superpose it on or other wise use it to create via computer techniques, the 
image of one of the bad (or good) guys in the game! This adds a personal touch to the 
action. 

Transmission of gaming data, thanks to the transmission properties of fiber cable, ISDN, 
25 the Internet or whatever, game opponents, objects and such an be in diverse physical 
places. On their screen they can see you, on your screen you would see them, with the 
computer then upon any sort of a hit changing their likeness to be injured or whatever. 

Figure 15 B illustrates on pistol 1585 a target indicator flag 1584 which is activated to 
30 signal the TV camera or cameras 1510 observing the pistol orientation and position. 
When the trigger is pulled, the flag with the target pops up indicating this event. 
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Alternatively, a LED can be energized to light (run by a battery in the toy) instead of the 
flag raising. Alternatively, a noise such as a "pop" can be made by the gun, which noise 
is picked up by a microphone 1521 whose signal is processed using taught sounds 
and/or signature processing methods known in the art to recognize the sound and used 
5 to signal the computer 1 520 to cause the projected TV image 1 500 to depict desired 
action imagery. 

In one embodiment of the Shooting Game, just described, a bad guy, or enemy 
depicted on the screen can shoot back at the player, and if so, the player needs to duck 

10 the bullet. If the player doesn't duck (as sensed by the TV camera computer input 

device of the invention,) then he is considered hit. The ducking reflex of the player to the 
gun being visibly and audibly fired on the screen is monitored by the camera that can 
look at datums on, or the natural features of, the player, in the latter case for example, 
the center of mass of the head or the whole upper torso moving from side to side to 

1 5 duck the bullet or downward. Alternatively, the computer TV camera combination can 
simply look at the position, or changes in the position of the target datum's on the 
player. The center of mass in one embodiment can be determined by simply 
determining the centroid of pixels representing the head in the gray level TV image of 
the player. 

20 

Its noted that both the sword and the pistol are typically pointed at the screen, and since 
both objects are extensive in the direction of pointing, the logical camera location is 
preferably to the side or overhead- rather than on top or side of the. screen, say. In 
addition, line targets aligned with the object axis, such as 1586 on pistol 1585 are useful 
25 for accurately determining with a stereo camera pair the pointing direction of the object. 

Where required, features or other data of the sword and pistol described, or the user, or 
other objects used in the game, may be viewed with different cameras 1590 and 1591 
(also processed by computer 1520) in order that at any instant in the game, sufficient 
30 data on the sword (or pistol, or whatever) position and/ or orientation can be determined 
regardless of any obscuration of the targets or other effects which would render targets 



72 



invisible in a particular camera view. Preferably, the computer program controlling the 
sensors of the game or other activity, chooses the best views, using the targets 
available. 

5 In this case illustrated, it is assumed that target location with respect to the data base of 
the sword is known, such that a single camera photogrammetry solution as illustrated in 
fig 1b can be used if desired. Each camera acquires at least 3 point targets( or other 
targets such as triangles allowing a 3D solution) in its field, and solves for the position 
and orientation using those three, combined with the object data base. In one control 
10 scheme, Camera 1590 is chosen as the master, and only if it cant get an answer is 
camera 1591 data utilized. If neither can see at least 3 targets, then data from each 
camera as to target locations is combined to jointly determine the solution (e.g. 2 targets 
from each camera). 

15 The primary mode of operation of the system could alternatively be to combine data 
from two cameras at all times. Often the location of choice is to the side or overhead, 
since most games are played more or less facing the screen with objects that extend in 
the direction to the screen (and often as result are pointed at the screen). For many 
sports however, camera location looking outward from the screen is desired due to the 

20 fact that datums maybe on the person or an object. In some cases cameras may be 
required in all 3 locations to assure an adequate feed of position or orientation data to 
computer 1520. 

The invention benefits from having more than 3 targets on an object in a field, to provide 
25 a degree of redundancy. In this case, the targets should desirably be individually 
identifiable either due to their color, shape or other characteristic, or because of their 
location with respect to easily identifiable features of the sword object. 

Alternatively, one can use single targets of known shape and size such as triangles 
30 which allow one to use all the pixel points along an edge to calculate the line - thus 
providing redundancy if some of the line is obscured. 
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Note that one can use the simple tracking capability of the invention to obtain the 
coordinates of a target on a user in a room with respect to the audio system and, if 
desired also with respect to other room objects influencing sound reverberation and 
5 attenuation. This coordinate can then be used by a control computer not shown for the 
purpose of controlling a audio system to direct sound from speakers to the user. Control 
of phase and amplitude of emission of sound energy. While a single target on a hat can 
be simply detected ad determined in its 3D location by the two or more camera stereo 
imaging and analysis system of the invention, natural features of the use could 
10 alternatively, or in addition be used, such as determining from the gray level image 
detected by the TV camera of fig 1 say, the users head location. As pointed out 
elsewhere, the target can be on the body, and the head can be found knowing the 
target location - to simplify identification of the head in an overall image of a complex 
room scene, say. 

15 

Besides control of audio sound projection, such coordinate data can also be used to 
control the screen display, to allow stored images to be directed in such a way as to 
best suit a use in a given part of a room, for example using directional 3D projection 
techniques. If user head angle as well is determined, then the viewpoint of the display 
20 can be further controlled therefrom. 

Data Transmission 

Programs used with the invention can be downloaded from a variety of sources. For 
25 example: 

• Disc or other storage media packed with a object such as a toy, preferably one with 
easily discernable target features, sold for use by the invention. 

• From remote sources, say over the internet, for example the web site of a sponsor of 
a certain activity. For example daily downloads of new car driving games could come 

30 from a car company's web site. 
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A partner in an activity, typically connected by phone modem or internet, could not 
only exchange game software for example, but the requisite drivers to allow ones 
local game to be commanded by data from the partners activity over the 
communication link. 

5 

One of the interesting aspects of the invention is to obtain instructions for the computer 
controlling the game (or other activity being engaged in) using the input of the invention, 
from remote sources such as over the Internet. For example, let us say that General 
Motors wanted to sponsor the car game of the day played with a toy car that one might 
10 purchase at the local Toys-R-Us store and with its basic dashboard and steering wheel 
brake panel accelerator, gear lever, etc. All devices that can easily be targeted inputted 
via the video camera of the invention of figure 4. 

Today such a game would be simply purchased perhaps along with the dashboard kit 
15 and the first initial software on DVD or CD ROM. In fact those mediums could typically 
hold perhaps ten games and DVD of different types. 

For example, in the GM case, one day it could be a Buick and the next day a Corvette 
and so on with the TV view part of this screen changing accordingly. 

20 

Remote transmission methods of the Internet, ISDN, fiber links dedicated or shared or 
otherwise are all possible and very appealing using the invention. This is true in many 
things, but in this case particularly since the actual data gathered could be reduced to 
small amounts of transmitted data. 

25 

The stereo photogrammetric activity at the point of actual determination can be used 
directly to feed data to the communications media. Orientation and position of objects or 
multiple points on objects or the like can be transmitted with very little bandwidth, much 
less difficult than having to transmit the complete image. In fact, one can transmit the 
30 image using the same cameras and hen use the computer at the other end to change 
the image in response to the data transferred, at least over some degree of change. 



75 



This is particularly true if one transmits a prior set of images that corresponds to 
different positions. These images can be used at any time in the future to play the game 
by simply calling them up form the transmitted datum's. 



5 Similar to the playing function of figures 5,15 etc, there is also a teaching function, as 
was discussed relative to medical simulations in fig 8.. The invention is for example, also 
useful in the teaching of ballet, karate, dance and the like. The positions and orientation 
of portions of the ballerina or her clothes can be determined busing the invention, and 
compared to computer modeled activity of famous ballerinas for example. Or in a more 

10 simple case, a motion of the student, can be used to call TV images from memory bank 
which were taken of famous ballerinas doing the same move- r of her instructor. And, 
given the remote transmission capability, her instructor may be in another country. This 
allows at least reconstructed motion at the other end using a very small amount of 
transmitted data, much the same as we would reconstruct the motion of a player in the 

15 game. 

While this doesn't answer the question of how the instructor in the ballet studio actually 
holds the student on occasion but it does help the student to get some of the movement 
correct. It also allows one to overlay visually or mathematically, the movements of the 
20 student generated, which have now been digitized in three dimensions, on the digitized 
three dimensional representation of famous ballerinas making the same basic moves, 
such as pas-de-chat. This allows a degree of self-teach capability, since clearly one 
might wish to look at the moves of perhaps three or four noted ballerinas and compare. 

25 The invention thus can use to advantage 3D motion done at very low cost in the home 
or in a small time ballet studio but nonetheless linked through CD ROM, the Internet or 
other media to the world's greatest teachers or performers. What holds true for ballet 
generally would also hold true for any of the sports, artistic or otherwise that are taught 
in such a manner. These can particularly include figure skating, golf or other sports that 

30 have to do with the moves of the person themselves. 



76 



One can use the invention to go beyond that, to the moves of the person themselves 
relative to other persons. This is particularly discussed in the aforementioned co- 
pending application relative to soccer and hockey, particularly relative to hose sports 
that have goaltenders against whom one is trying to score a goal. Or conversely, if 
5 you're the goaltender, learning defense moves against other teams that are trying to 
score on you. In each one could have a world famous goalie instructing, just as in the 
ballet above, or one could have world famous forwards acting against you. 

This is a very exciting thing in that you get to play the "best", using the invention. These 
10 can even be using excerpts from famous games like the Stanley Cup, World Cup and 
so on. Like the other examples above, the use of 3D stereo displays for games, for 
sports, for ballet or other instruction, is very useful, even if it requires wearing well 
known stereo visualization aids such as TV frame controlled LCD based or polarized 
glasses. However a lot of these displays are dramatic even in two dimensions on a 
15 large screen. 

Let us now consider how the game would work with two players in the same room with 
play either would be with respect to themselves or with respect to others. 

20 Where there are cases of coordinated movements for the same purpose as in figure 
skating, ballet and the like, most of such games are one person relative to the other, 
sensing sword play, pistol duels, karate, and so on. In what mode does this particularly 
connect with the invention? 

25 In figure 5 above we've illustrated the idea of two children playing an airplane game. In 
this case, they are playing with respect to themselves. But not necessarily directly, but 
rather indirectly by viewing the results of their actions on the screen, and it is on the 
screen that the actual event of their interaction takes place. In addition it should be 
noted that a single player can hold an airplane in each hand and stage the dogfight 

30 himself. 
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In the case shown it was an airplane dogfight, one with respect to the other. Although as 
discussed, one can using the invention, by simply changing ones command cues, by 
movements, gestures or another mode desired, change it from an airplane to a ship, or 
even change it from airplanes to lions and tigers. It is determined in the software and 
5 the support structure around the software. 

The actual movements of the person or objects are still determined and still come into 
play. There are differences though of course because in the case of lions and tigers, 
one might wish to definitely target the mouth so that you could open your jaws and eat 
1 0 the other person or whatever one does. 

The targeting of a beak outline was illustrated in the Big Bird Internet puppet example of 
fig 5. Curvilinear or Line targets are particularly useful for some of these as opposed to 
point targets. Such targets are readily available using retro-reflective beading as is 
15 commonly found today on some athletic shoes, shirts, or jackets, for the purpose 
reflecting at night. 

The use of co-located two players, one versus the other, but through the medium of the 
screen, is somewhat different But if the screen is large enough it gives the ability to be 
20 real. In other words, the player on the screen is so large and so proportional, that it 

takes over the fact that the player in the room with you is not a real one(s), but rather his 
representation on the screen. Any sort of game can be done this way where the sensed 
instruments are pistols, swords and the like. 

25 In many cases the object locations and orientations sensed are simply the objects 

relative to the camera system. But often times, what is desired is the relative position of 
either the people or the object as has been discussed in referenced US Patent 
applications by Tim Pryor. 

30 Now described is a teaching embodiment of the invention also for use remotely over the 
Internet or otherwise in which ballet instruction is given, or architecture is taught or 
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accomplished.. The teaching session can be stored locally or transmitted over a 
computer link such as the Internet. Karate or dance for example can be taught over the 
Internet. Targets if required, can be attached to arms, hands, legs, or other parts of the 
body. The user's body part paths can be tracked in space in time by one or more 
5 camera systems. The video can be analyzed in real-time or can be recorded and later 
analyzed. 

The TV image data can ultimately even be converted to "Quant" data representing 
sequences of motion detected by the camera system for compact data transmission and 

10 storage. In this case, the specific path data could be recognized as a specific karate 
thrust, say. This motion together with its beginning and end locations and orientation 
may be adequate for an automatic system. On the other hand, a two-way Internet 
connection would allow the instructors move to be compared with that of the student. By 
reducing the data to Quant data the instructors and students size differences could be 

15 factored out. 

The invention can be used to determine position and orientation of everyday objects for 
training and other purposes. Consider that position and orientation of a knife and fork in 
ones hands can be detected and displayed or recorded, if target datum's are visible to 
20 the camera system, either natural (e.g. a fork tip end) or artificial, such a retro-reflective 
dot stuck on. This allows one to teach proper use of these tools, and for that matter any 
tools, such as wrenches, hammers, etc. indeed any apparatus that can be held in the 
hands (or otherwise). The position too of the apparatus held with respect to the hands 
or other portions of the body for other bodies maybe determined as well. 

25 

This comes into clear focus relative to the teaching of dentists and physicians, 
especially surgeons. Scalpels, drills, and the like may all be targeted or other wise 
provided with natural features such as holes, slots, and edges which can work with the 
invention. 
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In the military such training aids are of considerable use, and become as well an aid to 
inspiring young recruits, for whom the TV display and video game aspect can render 
perhaps a dull task, fun. The proper ergonomic way to dig a foxhole, hold a rifle, could 
be taught this way, just as one could instruct an autoworker on an assembly line 
5 installing a battery in a car. 

Figure 16 

Fig 16 illustrates an embodiment of the invention suitable for use on airplanes and other 
10 tight quarters. A computer having an LCD screen 1610, which can attached if desired to 
the back of seat ahead 1605 (or to any other convenient member), has on either side of 
the screen, near the top, two video cameras 1615 and 1616 of the invention, which view 
workspace on and above the tray table folding down from the seat ahead. The user 
communicates with the computer using a microphone (for best reception a headset type 
15 not shown, connected to the computer) which converts voice to letters and words using 
known voice recognition techniques. For movement of words, paragraphs, and portions 
of documents, including spread sheet cells and the like, the user may use the invention. 

In the form shown, he can use a variety of objects as has been discussed above. For 
20 simplicity, consider battery powered LED 1620 on his finger, 1625, which emits at a 

narrow wavelength region which is passed by band pass filters (not shown for clarity )on 
the front of cameras 1616 and 1615, respectively.. Since a full 3 degree of freedom 
location of the finger LED is possible, movement off the table of the finger (which other 
wise becomes a sort of mouse pad, or touch pad in 2 Axes) can be used to optionally 
25 signal the program to perform other functions. Or if there are 3D graphics to interact 
with, it can be of great utility for them. Indeed, other fingers, or of the other hand can 
also contain LED targets which allow many functions described herein to be performed 
in up to 6 axes. 

30 One can also place a normal keyboard such as 1650 interfaced to the computer (built 
into the back of the led display for example) on the tray table (or other surface), and use 
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the led equipped finger(s) to type normally. But a wide variety of added functions can 
again be performed., by signaling the computer with the LED targets picked up by the 
video cameras. There can be movement gestures to signal certain windows to open for 
example. Other functions are: 
5 1 . Pointing with finger with target and 3pints on wrist at icon or other detail depicted on 
screen. 

2. Extend values out of chart in 3 rd dimension by pulling with targeted fingers in the 
manner described in figure 6. 

3. Solid icons can be placed on the tray table and detected, in this case each having a 
10 small led or leds and battery. These can be moved on the table to connote meaning 

to the computer, such as the position of spread sheet cells or work blocks in pert 
chart, and the like. 

4. Use cameras to detect position of laser spot on an object on the tray illuminated by a 
laser pointer held in the hand of the user (preferably the laser wavelength and led 

15 wavelength would be similar to allow both to pass the bandpass filters). 

5. Its noted the screen could be larger than otherwise used for laptop computers, since 
it is all out of the way on the back of the seat (or at a regular desk, can stand up with 
folding legs for example). The whole computer can be built into the back of the 
device (and is thus not shown here for clarity). 

20 6. A storage space for targeted objects used with the invention can be build into the 
screen/computer combination or carried in a carrying case. Attachments such as 
targets for attachment to fingers can also be carried. 
7. Its noted that for desk use the invention allows human interaction with much larger 
screens than would normally be practical. For example if the screen is built into the 

25 desktop itself (say tilted at 45% like a drafting board), the user can grab/grip/pinch 
objects on the screen using the invention, and move them rotate them or other wise 
modify their shape, location or size for example using natural learned skills. Indeed a 
file folder can be represented literally as a file folder of normal size, and documents 
pulled out by grabbing them. This sort of thing works best with high resolution 

30 displays capable of the detail required. 
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Figure 16 has illustrated an embodiment of the invention having a mouse and/or 
keyboard of the conventional variety combined with a target of the invention on the user 
to give an enhanced capability even to a conventional word processing or spreadsheet, 
or other program. 

5 

For example consider someone whose interest is developing a spreadsheet prediction 
for company profit and loss. Today this is done exclusively using a keyboard to type in 
data, and a mouse (typically) to direct the computer to different cells, pull down window 
choices and the like. This job is generally satisfactory, but leads to carpal tunnel 
10 syndrome and other health problems and is somewhat slow-requiring typing or mouse 
movements that can overshoot, stick and the like. 

Voice recognition can clearly be used to replace the typing, and gesture sensing 
according to the invention including specialized gestures or movements such as shown 
15 in figure 5 can be used to improve recognition of voice inputs by the computer system. 

But what else is possible? Clearly one can use the touch screen indicator aspect to 
point directly at objects on the screen. For example, consider a user such as in figure 12 
may seated in front of a large high definition display screen on a wall or tilted 45 

20 degrees as at a writing desk. The user can either touch (or near touch) the screen as in 
fig 12 or he can point at the screen with his finger targeted with retro-reflective scotch- 
lite glass bead target and the pointing direction calculated using the 3 target set on top 
of his wrist as in fig 1b. The screens' datum's are known, for example four retro- 
reflective plastic reflector points at the corners 1270-1273 as shown. As elsewhere 

25 discussed, projected targets on the screen can also be used to establish screen 

locations-even individually with respect to certain information blocks if desired. A Stereo 
camera pair senses the positions of wrist and finger, and directs the computer and TV 
projector (not shown) to follow the wishes of the user at the point in question. The user 
may use his other hand or head if suitably targeted or having suitable natural features, 

30 to indicate commands to the camera computer system as well. 
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Of interest is that the display can be in 3D using suitable LCD or other glasses to 
provide the stereo effect. This allows one to pull the values out of the excel chart and 
make them extendable in another dimension. One can pull them out, so to speak by 
using for example as shown in figure 6, using two targeted fingers (e.g. targeted thumb 
5 and targeted finger and grab or pinch and pull the object in the cell. In a word processor 
the word on the page can be so grabbed. 

On can use this effect to work backward form a 3D bar graph created by the spread 
sheet program i.e. to press on the individual bars until the form of the data shown meets 
10 ones goals, by pressing as in a repeated finger motion downward, the program changes 
the data in certain cell scenarios (e.g. sales, expenses, profits, etc.). 

In another example, transparent targeted blocks may be moved over the top of 
transparent rear projection screen. The blocks can also extend in height above the 
15 screen by a variable amount. Data can be inputted by the computer screen, but also by 
varying the block height. The height is then encoded into the screen projection to 
change the color or another parameter. 

In the factory layout example of figure 14 above, if blocks are translucent and placed on 
20 a screen, the colors, written description, or pictorial description (e.g. a lathe, or a mill) of 
screen, with the target data on the block tracked and fed to the TV projection source. 
Such an arrangement might be useful for other complex tasks, also in real time, as in 
Air traffic control. 

25 Other target arrangements sufficient to determine pointing direction can also be used. 
This pointing method can also be used to point at anything-not just screens. It is 
especially useful with voice commands to tell the pointed item to do something. It is also 
of use to cue the projection system of the TV image to light up the pointed area or 
otherwise indicate where pointing is taking place. 
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For giving presentations to a group, the invention can operate in reverse from a normal 
presentation computer- that is the person standing giving the presentation can point at 
the screen where the information is displayed, and what he pointed at, grasped, or what 
ever recorded by the cameras of the invention into the computer. 

5 

It is further noted that a laser pointer can be targeted and used for the purpose. 
Figure 17 

10 This embodiment illustrates the versatility of the invention, for both computer input, and 
music. As shown in figure 17A, a two camera stereo pair 1701 and 1702connected to 
computer 1704 such as mentioned above for use in games, toys and the like can also 
be used to actually read key locations on keyboards, such as those of a piano or 
typewriter. As shown, letters or in the piano case, musical note keys such as 1708 with 

15 retro target 1720 on their rear, beneath the keyboard, are observed with the camera set 
1701 . A Z axis movement gives the key hit (and how much, if desired-assuming elastic 
or other deformation in response to input function by player finger 1710), while the x 
(and y if a black key, whose target is displaced for example) location of the key tells 
which letter or note it is. Speakers 1703 and 1705 provide the music from a MIDI 

20 computer digital to speaker audio translation. 

For highest speed and resolution, useful with long keyboards, and where the objects to 
be observed are in a row (in this case the keys), the two cameras are in this instance 
composed of 2048 element Reticon line arrays operating at 10,000 readings per 
25 second. Specialized DSP processors to determine the stereo match and coordinates 
may be required at these speeds, since many keys can be pressed at once. 

Alternatively, the piano players finger tips as disclosed in previous embodiments can be 
imaged from above the keyboard (preferably with retroreflective targets for highest 
30 speed and resolution) to create knowledge of his finger positions. This when coupled 
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with knowledge of the keyboard data base allows one to determine what key is being 
struck due to the z axis motion of the finger. 

Fig 18 

5 

Virtual musical instruments are another music creation embodiment of the invention. A 
dummy violin surrogate such as 1820 in figure 18 can be provided which is played on 
bowstrings real or dummies by a bow 1825 also real or dummies The position of the 
bow, vis a vis the dummy violin body 1 830 proper, and the position of the fingers 1 840 
1 0 (which may be targeted) gives the answer as to what music to synthesize from the 
computer. It is envisioned that the easiest way to operate is to use retro-reflecting 
datums such as dot or line targets 1830, 1831, 1832, and 1833, on all of the bow, violin, 
and fingers, viewed with stereo camera system 1850 connected to computer 1851 and 
one or more loudspeakers 1855. 

15 

Frequency response is generally enough at 30 frames per second typical of standard 
television cameras to register the information desired, and interpolation can be used if 
necessary between registered positions (of say the bow). This may not be enough to 
provide full timber of the instrument however. One can use faster cameras such as the 
20 line arrays mentioned above (if usable), PSD cameras as in fig 22 and/or techniques 
below to provide a more desirable output. 

The input from the targeted human, or musical instrument part (e.g. key or bow or 
drumstick) may cause via the computer the output be more than a note, for example a 
25 synthesized sequence of notes or chords - in this manner one would play the 

instrument only in a simulated sense- with the computer synthesized music filling in the 
blanks so to speak. 

Similarly a display such as 1 860 may be provided of the player playing the simulated 
30 instrument surrogate, may use the data of positions of his hands in a few positions, and 
interpret between them, or call from memory more elaborate moves either taught or 
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from a library of moves, so that the display looks realistic for the music played (which 
may be also synthesized) as noted above. 

The display fill in is especially easy if a computer model of the player is used, which can 
5 be varied with the position data determined with the invention. 

Fig 19 

Figure 19 illustrates a method for entering data into a CAD system used to sculpt a car 
10 body surface, in which a physical toy car surrogate for a real car model, 1910, 

representing for example the car to be designed or sculpted, is held in a designers left 
hand 1902, and sculpting tool 1905 in his right hand 1906. Both car and tool are sensed 
in up to 6 degrees of freedom each by the stereo camera system of the invention, 
represented by 1912 and 191 3, (connected to a computer not shown used to process 
15 the camera data, enter data into the design program, and drive the display 1915). The 
objects are equipped with special target datums in this example, such ass 1920-1922 
on car 1910, and 1925-1927 on sculpting tool 1905. A display of a car to be designed 
on the screen is modified by the action of the computer program responding to positions 
detected by the camera system of the sculpting tool 1905 with respect to the toy car, as 
20 the tool is rubbed over the surface of the toy car surrogate. 

One can work the virtual model in the computer with tools of different shapes. Illustrated 
are two tools 1930 and 1931, in holder 1940 of a likely plurality, either of which can be 
picked up by the designer to use. Each has a distinctive shape by which to work the 

25 object, and the shape is known to the design system. The location of the shaped portion 
is also known with respect to the target datum's on the tools such as 1950-1952. As the 
tool is moved in space, the shape that it would remove (or alternatively add, if a build up 
mode is desired) is removed from the car design in the computer. The depth of cut can 
be adjusted by signaling the computer the amount desired on each pass. The tool can 

30 be used in a mode to take nothing off the toy, or if the toy was of clay or coated in some 
way, it could actually remove material to give an even more lifelike feel. 



86 



3 targets are shown, representatively on tool 1930, with three more optionally on the 
other side for use if the tool becomes rotated with respect to the cameras. Each tool has 
a code such as 1960 and 1961 that also indicates what tool it is, and allows the 
5 computer to call up from memory, the material modification effected by the tool. This 
code can be in addition to the target datum's, or one or more of the datum's can include 
the code. 

Figure 20 

10 

Figure 20 illustrates an embodiment of the invention used for patient monitoring in the 
home, or hospital. A group of retro-reflective targets such as 2021 , 2030, and 2040 are 
placed on the body of the person 2045 and are located in space relative to the camera 
system, (and if desired relative to the bed 2035 which alsO may include target 2036 to 
15 aid its location), and dynamically monitored and tracked by stereo camera system 2020 
composed of a pair of VLSI Vision 1000 x 1000 CMOS detector arrays and suitable 
lenses. 

For example, target 2021 on chest cavity 2022 indicates whether the patient is 
20 breathing, as it goes up and down. This can be seen by comparison of target location in 
sequential images, or even just target blur (in the direction of chest expansion) if the 
camera is set to integrate over a few seconds of patient activity. 

Target 2030 on the arm, as one example of what might be many, is monitored to 
25 indicate whether the patient is outside a perimeter desired, such as the bed 2035. If so, 
computer, 2080 is programmed to sound an alarm 2015 or provide another function, for 
example alerting a remote caregiver who can come in to assist. Microphone, such as 
2016 may also be interfaced to the computer to provide a listening function, and to 
signal when help his needed. 
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Also illustrated is an additional target or targets another portions of the chest or body, 
such as 2040, so that if the patient while asleep or otherwise covers one with his arm, 
the other can be sensed to determine the same information. 

5 Also disclosed, is like figure above, the conversion of a variable of the patient, in this 
case blood pressure, into a target position that can be monitored as well. Pressure in 
manometer 2050 causes a targeted indicator 2060 (monitored by an additional camera 
2070 shown mounted to the end of the bed and achieving higher resolution if desired) to 
rise and fall, which indicates pulse as well. 

10 

While described here for patients, the same holds true for babies in cribs, and the 
prevention of sudden infant death syndrome(SIDS), by monitoring rise and fall of their 
chest during sleep, and to assure they are not climbing out of the crib or the like. 

15 Figure 21 

Following from the above, a simple embodiment of the invention may be used to 
monitor and amuse toddlers and preschool age children. For example in the figure 1 
embodiment a Compaq 166 MHz Pentium computer 8, with Compaq 2D color TV 

20 camera 10, was used, together with an Intel frame grabber and processor card to grab 
and store the images for processing in the Pentium computer. This could see small retro 
targets on a doll or toddlers hands, with suitable LED lighting near the camera axis. The 
toddler is seated in a high chair or walking around at a distance for example of several 
feet from the camera mounted on top of the TV monitor. As the toddler moves his 

25 hands, or moves the dolls hands, alternatively ) an object such as a doll image or a the 
modeled computer graphics image of clown, let us say could move up and down or side 
to side on the screen, (in the simple version of fig 1 , only x and y motions of the toddler 
body parts or doll features are obtainable.) For comfort and effect, the image of the 
clown can also be taken or imported from other sources, for example a picture of the 

30 child's father. 
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As the child gets older, single or dual camera stereo of the invention can be used to 
increase the complexity with which the child can interact to 3, 4, 5, or 6 degrees of 
freedom with increasing sophistication in the game or learning experience. 

5 Other applications of the invention are also possible. For example the toddler can be 
"watched" by the same TV camera periodically on alternate TV frames, with the image 
transmitted elsewhere so his mother knows what he is doing. 

His movements indicate as well what he is doing and can be used as another 
10 monitoring means. For example, if he is running or moving at too great a velocity, the 
computer can determine this by a rate of change of position of coordinates, or by 
observing certain sequences of motion indicative of the motion desired to monitor. 
Similarly, and like the patient example above, if the coordinates monitored exceed a 
preset allowable area (e.g. a play space), a signal can be indicated by the computer. 

15 

The device also useful for amusement and learning purposes.The toddler's wrists or 
other features can be targeted, and when he claps, a clapping sound generated by the 
computer in proportion, or by different characteristics or the like. The computers can be 
programmed using known algorithms and hardware talk to him, and tell him to do 
20 things, and monitor what he did, making a game out of it if desired. It also can aid 
learning, giving him visual feedback and audio and verbal appreciation of a good 
answer, score and the like. 

Similarly, we believe the invention can be used to aid learning and mental development 
25 in very young children and infants by relating gestures of hands and other bodily 

portions or objects such as rattles held by the child, to music and/or visual experiences. 

Let us consider the apparatus and method of fig 21 where we seek to achieve the 
advantageous play and viewing activity, but also to improve the learning of young 
30 children through the use of games, musical training and visual training provided by the 
invention- in the case shown here starting with children in their crib where they move 
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from the rattle to mobile to busy box (e standing in crib) stage, the invention providing 
enhanced versions thereof and new toys made possible through LCD display attached 
to the crib and the like. The second issue is what sorts of new types of learning 
experiences can be generated that combine music, graphics and other things. 

5 

Consider fig 21 , wherein an LCD TV display 2101 is attached to the end of crib 2102, in 
which baby 2105 is laying, placed so baby can see it. This display could be used to 
display for example a picture of the child's parents or pets in the home, or other desired 
imagery which can respond both visually and audibly to inputs from the baby sensed 
10 with the apparatus of fig 1 , or other apparatus of the invention. These are then used to 
help illustrate the learning functions. The camera system, such as stereo pair, 2110 and 
2115 are located as shown on the edges of the LCD screen or elsewhere as desired, 
and both are operated by the computer 2135. Notice that the design with the cameras 
integrated can be that of the lap top figure 22 application as well. 

15 

The baby's hands, fingers, head, feet or any other desired portion can be targeted, on 
his clothes or directly attached. Or natural features can be used if only simple actions 
such as moving a hand or head are needed (all possible today with low cost computer 
equipment suitable for the home). And importantly, the baby can easily hold a targeted 
20 rattle such as 2130 having target datums 2152 and 2153 at the ends(whose sound may 
be generated from the computer speaker 2140 instead, and be programmably changed 
from time to time, or react to his input) and he may easily touch as today a targeted 
mobile in the crib as well, or any other object such as a stuffed animal, block or what 
ever. 

25 

In essence, the invention has allowed the baby to interact with the computer for the first 
time in a meaningful way that will improve his learning ability, and IQ in future years. It is 
felt by the inventors that this is a major advance. 

30 Some learning enhancements made possible are: 
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A computer recorded voice (with associated TV image if desired) of the child's 
parents or siblings for example, calling the child's name, or saying their names. Is 
responded to by the baby, and voice recognition picks up the child's response and 
uses it to cue some sort of activity. This may not even be voice as we know it but the 
sounds made by a child even in the early stages before it learns to talk. And it may 
stimulate him to talk, given the right software. 

The child can also move his hands or head and similar things can take place. For 
example, he can create music, or react to classical music (a known learning 
improvement medium today) perhaps by keeping time, or to cue various visual cues 
such as artistic scenes or family and home scenes that he can relate to certain 
musical scores and the like. 

The child can also use the computer to create art, by moving his hand, or the rattle 
or other object, and with some simple program, may be able to call up stored images 
as well. 

Another embodiment could have the child responding to stored images or sounds, 
for example from a DVD Disc read by the computer 2135, and sort of vote on the 
ones he liked, by responding with movement over a certain threshold level, say a 
wiggle of his rattle. These images could later be played back in more detail if 
desired. And his inputs could be monitored and used by professional diagnosis to 
determine further programs to help the child, or to diagnose if certain normal 
patterns were missing - thus perhaps identifying problems in children at a very early 
age to allow treatment to begin sooner, or before it was too late. 

The degree of baby excitement (amplitude and rate, etc. of rattle, wiggle, head arm 
movement). 
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Note that in an ultimate version, data directly taken from the child, as in fig 16 
example, can be transmitted to a central learning center for assistance, diagnosis, or 
directly for interactivity of any desired type. 



5 Therapy and Geriatrics 

It is noted that an added benefit of the invention is that it can be used to aid mute and 
deaf persons who must speak with their hands, the interpretation of sign language can 
be done by analyzing dynamic hand and finger position and converting via a learning 
10 sequence or other wise into computer verbiage or speech. 

It is also noted that the invention aids therapy in general, by relating motion of a portion 
of the body to a desired stimulus, (visual auditory or physical touch) Indeed the same 
holds for exercise regimes of healthy persons. 

15 

And such activity made possible by the invention is useful for the elderly who may be 
confined to wheelchairs, unable to move certain parts of the body or the like. It allows 
them to use their brain to its fullest, by communicating with the computer in a different 
way. 

20 

Alternatively, stroke victims and other patients may need the action of the computer 
imagery and audio in order to trigger responses in their activity to re train them- much 
like the child example above. 

25 An interesting example too are elderly people who have played musical instruments but 
can no longer play due to physical limitations. The invention allows them to create 
music, by using some other part of their body, and by using if needed, a computer 
generated synthesis of chords, added notes or what ever, to make up for their inability 
to quickly make the movements required. 

30 

Other applications of the invention 
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One of the advantages of this invention is that all sorts of objects can be registered in 
their function on the same camera system, operating both in single, dual or other stereo 
capabilities and all at low cost. This particular issue that the people, the objects, the 
5 whole stationary platform such as desk, floors, walls, al can be registered with the same 
generic principles, is a huge benefit of the application. 

This means that the cost of writing the operating control software suitable for a large 
number and variety of applications only has to be done once. And similarly the way in 
1 0 which it operates, the way in which the people interact with it, only has to be learned 
once. Once one is familiar with one, one is almost familiar with all., and none need cost 
more than a few dollars or tens of dollars by itself in added cost. 

The standard application aspect of the invention is important too from the point of view 
15 of sharing cost of development of hardware, software, target, material etc over the 
largest possible base of applications, such that production economies are maximized. 

This is relatively the same as the situation today, where one uses a mouse all the time, 
for every conceivable purpose. But the mouse itself is not a natural object. One has to 

20 learn its function, and particular to each program, one may have to learn a different 
function. Whereas in the invention herein described, it is felt by the inventors that all 
functions are more or less intuitive and natural; the teaching, the games, the positioning 
of objects on a CAD screen. All these are just the way one would do it in normal life. It is 
possible to see this when one talks and how one uses one's hands to illustrate points or 

25 to hold objects in position or whatever. Whatever you do with your hands, you can do 
with this invention. 

Speech recognition. 

30 One application of this actually to aid in speech recognition. For example, in Italy in 
particular, people speak with their hands. They don't speak only with their hands, but 



93 



they certainly use hand signals and other gestures to illustrate their points. This is not of 
course just true in Italian language, but the latter is certainly famous for it. 



This invention allows one to directly sense these positions and movements at low cost. 
5 What this may allow one to do then is utilize the knowledge of such gestures to act as 
an aid to speech recognition. This is particularly useful since many idiomatic forms of 
speech are not able to be easily recognized but the gestures around them may yield 
clues to their vocal solution. 

10 For example, it is comprehended by the invention to encode the movements of a 
gesture and compare that with either a well known library of hand and other gestures 
taken from the populace as a whole or taught using the gestures of the person in 
question. The person would make the gesture in front of the camera, the movements 
and/or positions would be recorded, and he would record in memory, using voice or 

15 keyboard or both, what the gesture meant- which could be used in future gesture 
recognition, or voice recognition with accompanied gesture. A look up table can be 
provided in the computer software, where one can look up in a matrix of gestures, 
including the confidence level therein, including the meaning, and then compare that to 
add to any sort of spoken word meaning that needs to be addressed. 

20 

Artifacts 

One of the advantages of the invention is that there is a vast number of artifacts that 
can be used to aid the invention to reliably and rapidly acquire and determine the 

25 coordinates of the object datums at little or no additional cost relative to the 

camera/computer system. For example we discussed retro-reflective targets on fingers, 
belt buckles, and many forms of jewelry, clothing and accessories (e.g. buttons) and the 
like. Many of these are decorative and objects such as this can easily be designed and 
constructed so that the target points represented are easily visible by a TV camera, 

30 while at the same time being interpreted by human as being a normal part of the object 
and therefore unobtrusive, (see for example referenced Tim Pryor copending 
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applications) Some targets indeed can be invisible and viewed with lighting that is 
specially provided such as ultraviolet or infrared. 

Surrogates 

5 

An object, via the medium of software plus display screen and/or sound may also take 
on a life as a surrogate for something else. For example, a simple toy car can be held in 
the hand to represent a car being designed on the screen. Or the toy car could have 
been a rectangular block of wood. Either would feel more or less like the car on the 
10 screen would have felt, had it been the same size at least, but neither is the object 
being designed in the computer and displayed on the screen. 

Surrogates do not necessarily have to "feel right" to be useful, but it is an advantage of 
the invention for natural application by humans, that the object feel or touch can seem 
15 much like the object depicted on the screen display even if it isn't the same. 

Anticipatory moves 

The invention can sense dynamically, and the computer connected to the sensor can 
20 act on the data intelligently. Thus the sensing of datum's on objects, targeted or not, can 
be done in a manner that optimizes function of the system. 

For example if one senses that an object is rotating, and targets on one side may likely 
recede from view, then one can access a data base of the object, that indicates what 
25 targets are present on another side that can be used instead. 

Additional points 

It is noted that in this case, the word target or datum essentially means a feature on the 
30 object or person for the purpose of the invention. As has been pointed out in previous 
applications by Tim Pryor, these can either natural features of the object such as 
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fingernails or fingertips, hands or so on or can be what is often preferable, specialized 
datums put on especially to assist the function of the invention. These can include 
typically contrasting type datum's due to high brightness retro-reflection or color 
variation with respect to its surroundings, and often further distinguished or alternatively 
5 distinguished by some sort of pattern or shape. 

Examples of patterns can include the patterns on cloth such as stripes, checks, and so 
on. For example the pointing direction of a person's arm or sleeve having a striped cloth 
pointing along the length of the sleeve would be indicated by determining the 3D 
10 pointing direction of the stripes. This can easily be done using the edge detection 
algorithms with a binocular stereo cameras here disclosed. 

A useful shape can be a square, a triangle, or something not typically seen in the room, 
desktop, or other area that one would normally operate such that they stand out. Or 
15 even if a common shape, the combination of the shape with a specific color or 
brightness or both, often allows recognition. 

It is appreciated that beyond the simple 2 dimensional versions as described such as in 
figure one, many applications benefit from or either depend on 3D operation. This is 

20 disclosed widely within the application as being desirably provided either from a single 
camera or two or more cameras operating to produce stereo imagery that can be 
combined to solve for the range distance Z. However, z dimension data can also be 
generated, generally less preferably, by other means, such as ultrasonics or radar, or 
laser triangulation if desired to effect the desirable features of many of the applications 

25 described. 

Another point to stress concerning the invention is the fact of the performance of 
multiple functions. This allows it to be shared amongst a large number of different users 
and different uses for the same user and with a commonality as mentioned above of the 
30 teaching of it's function, the familiarity with it's use, and so forth. 
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One example of this is the use of a targeted hand which one moment is for a game, the 
next moment it's for a CAD input, and the next it's for music and whatever. 



A key is the natural aspect of the invention, that it enables, at low cost and high 
5 reliability the use of learned natural movements of persons- for work, for play, for 
therapy, for exercise- and a variety of other work and safety uses here disclosed, and 
similar to those disclosed. 

Figures 1 to 3 have illustrated several basic principles of optically aided computer inputs 
10 using single or dual/multicamera (stereo) photogrammetry. Illustrated are new forms of 
inputs to effect both the design and assembly of objects. 

When one pick ups polygon object-TV image of object itself can be processed, or more 
likely special ID data on the object or incorporated with the target datum's can be 
15 accessed by the computer to recognize the object, and call up the desired image-of the 
object, or of something it represents. Then as you move it, it moves-but you elaborate 
on computer rendition of it in due course given the users input and work, it gradually 
morphs to a car! (It could be a standard car instantly if the polygon were told to the 
computer to be a car). 

20 

One can draw on the computer screen, on a pad of paper or easel, or in the air with the 
invention. Computer instructions can come form all conventional sources, such as 
keyboards mice and voice recognition systems, but also from gestures and movement 
sequences for example using the TV camera sensing aspect of the invention. 

25 

Note that for example a targeted paint brush can instantly provide a real feeling way to 
use painting type programs. While painting itself is a 2D activity on the paper, the 3D 
sensing aspect of the invention is used to determine when the brush is applied to the 
paper, or lifted off, and in the case of pressing the brush down to spread the rush, the z 
30 axis movement into the plane of the paper determines how much spreading takes place 
(paper plane defined as xy). 
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The 3D aspect is also used to allow the coordinate system to be transformed between 
the xyz as so defined, and the angulation of the easel with respect to the camera 
system wherever it is placed typically overhead, in front or to the side somewhere This 
5 freedom of placement is a major advantage of the invention, as is the freedom of choice 
of where targets are located on objects, thanks to the two camera stereo system in 
particulars ability to solve all necessary photogrammetric equations. 

Note too that the angle of the brush or a pen held in hand with respect to the z axis can 
10 also be used to instruct the computer, as can any motion pattern of the brush either o 
the paper or waved in the air. 

In CAD activities, the computer can be so instructed as to Parametric shape parameters 
such as % of circle and square. As with the brush, the height in z may be used to 
1 5 control an object width for example. 

Illustrated too are a computer aided design system (CAD) embodiment according to the 
invention which illustrates particularly the application of specialized sculpture tools with 
both single and two alias object inputs, useful for design of automobiles, clothes and 
20 other applications. 

Physical feel of object in each hand is unique, and combines feel with sight on screen-it 
feels like what it is shown to be, even if it isn't really. Feel can be rigid, semi rigid, or 
indeed one can actually remove (or add) material from alias object. 

25 

Where two or more alias or surrogate objects according to the invention, for example for 
use in sculpture, whittling and other solid design purposes with one, two, or more 
coordinated objects. 
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Illustrated were additional alias objects according to the invention, for example for use in 
sculpture, whittling and other solid design purposes with one, two, or more coordinated 
objects. 

5 The unique ability of the invention to easily create usable and physically real alias 
objects results from the ease in creating targeted objects which can be easily seen at 
high speed by low cost TV and computer equipment (high speed is here defined as 
greater than 3 frames per second say, and low cost is under $5000 for the complete 
system including camera, light source(s), computer and display (multiple camera 
10 version somewhat higher). 

The objects can be anything on which 3M Scotch light 7615 type retro-reflective 
material can be placed, or other reflective or high contrast material incorporated in to 
the surface of an object. You can stick them on fingers, toys or whatever, and can be 
15 easily removed if desired. With two (or more) camera stereo systems, no particular way 
of putting them on is needed, one can solve photogrammetrically for any non co-linear 
set of three to determine object position and orientation, and any one target can be 
found in x y and z. 

20 The physical nature of the alias object, is a very important aspect of the invention. It 
feels like a real object, even though it's a simple targeted block, one feels that it is a car, 
when you view the car representation on the screen that the block position commands. 
Feel object, look at screen, this is totally different than controlling an object on a screen 
with a mouse. 

25 

Even more exciting and useful is the relative juxtaposition of two objects, with both on 
the screen. 

For example, a child can affix special targets (using Velcro, tape, pins, or other means) 
30 on his favorite stuffed toys and then he can have them play with each other, or even a 
third. Or two children can play, each with their own doll or stuffed animal. But on screen, 
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they convert the play into any kind of animal, including scenery (e.g. a barnyard). The 
animals can have voice added in some way, either by the computer, or by prerecorded 
sounds, or in real time via microphones. Via the internet, new voice inputs or other 
game inputs can be downloaded at will from assisting sites. And programs, and voice, 
and TV imagery can be exchanged between users. 

Computer imagery of the actual animal can be taken using the same TV camera, 
recorded, and the 3D position determined during play, and the image transformed into a 
3D image, rotated or whatever. 

The same argument of attaching targets to toys/applies to objects which are the 
physical manifestations of learned skills: 

A pencil to a draftsman; 

A scissors, chalk, and rule to a dressmaker; 

A brush to an artist; 

An instrument or portion(e.g. a drumstick, a bow ) to a musician; 

A axe to a lumberjack; 

A drill, hammer, or saw to a carpenter; 

A pistol to a policeman or soldier; 

A scalpel to a surgeon; 

A drill to a dentist; 

And so on. 

Each person can use a real, or alias object (e.g. a broomstick piece for a hammer) 
targeted as he chooses, in order to use the audio and visual capabilities of computer 
generated activity of the invention. All are more natural to him or her, than a mouse! In 
each case too, the object to be worked on can also be sensed with the invention: 
The cloth of the dress; 

The paper(or easel/table) of the artist or draftsman; 
The violin of the musician (along with the bow); 
The log of the lumberjack; 
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The teeth or head of the dental patient; 
And so on. 

The computer program, using the sensor input, can faithfully utilize the input, or it can 
5 extrapolate from it. For example rather than play middle C, it can play a whole chord, or 
knowing the intended piece, play several of the notes in that piece that follow. Similarly, 
one can start a simulated incision with a scalpel, and actually continue it a distance 
along the same path the student doctor started. 

10 Sounds, Noise and visual cues 

The cocking of a hammer on a toy pistol can act as a cue in many cases. A microphone 
connected to the computer can pick this up and analyze the signature and determine 
that a gun may be fired. This can cause the vision analysis program looking at the TV 
1 5 image to look for the pistol, and to anticipate the shot. The sound of the gun, rather than 
a visual indicator, can alternatively be used to cue the displayed image data as well. 
Two microphones if used, can be used to triangulate on the sound source, and even tell 
the TV camera where to look. 

20 In many cases sound and physical action are related. Sounds for example can be used 
to pick up a filing noise, to indicate that a alias object was actually being worked by a 
tool. The TV camera(s) can monitor the position and orientation of each, but the actual 
contact registered by sound. Or contact could be just the physical proximity of one 
image to another- however the sound is created by the actual physical contact which is 

25 more accurate, and more real to the user. 

Signature recognition 

The invention can look for many signatures of object position and movement- including 
30 complex sequences. This has been described in another context relative to fig 7 for 
recognizing human gestures. 
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The recognition algorithm can be taught before hand using the position or movement in 
question as an input, or it may be preprogrammed, to recognize data presented to it 
from a library, often specific to game/activity of interest. 

5 

Such recognition can also be used to Anticipate an action, For example, if a bow string 
or hand is moved directly back from a bow, recognition is that one is Drawing a bow, 
and that an arrow may be ready to be shot. The computer can then command the 
screen display or sound generation speakers to react (eyes, head move, person on 
10 screen runs away, etc.). 

Similarly, the actual action of releasing the bow can be sensed, and the program react 
to the move. 

15 It is of use to consider some of what even the simplest version of the invention, 

illustrated in fig 1a, could accomplish? In the lowest cost case, This uses retroreflective 
glass bead tape, or jewelry on an object to allow determination in x and y (plane 
perpendicular to camera axis) of, for example: 

20 1. position of one or more points on or portions of, or things to do with, babies, game 
players, old persons, disabled, workers, homemakers, etc. 

2. Determine position of object such as something representing position or value of 
something else. 

3. Determine location of a plurality of parts of the body, a body and an object, two 
25 objects simultaneously, etc. 

4. With additional software and datums, expand to fig 1b version, and Determine up to 
six dimensional degrees of freedom of object or of one object or more with respect to 
each other). Use Single camera but with target set having known relationships. 
(Single camera photogrammetry). 
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Today, costs involved to do the foregoing would appear to be a USB camera and in the 
simplest case, no frame board; just right into the computer. This today could result in 
images being processed at maybe 10 hertz or less. Simple thresh holding, probably 
color detection would all that would be needed. More sophisticated shape, recognition 
5 and finding of complex things in the scene are not required in simple cases with limited 
background noise, and are aided by use of the retroreflector or LED sources. 

The only other equipment that would be needed in this scenario is the lighting unit that 
would surround the camera. Clearly this would be somewhat camera specific in terms of 
10 its attachment and so on. Many cameras, as it would appear that have been designed 
for internet. 

Cameras and lighting as needed could be built right into the TV display units. 

15 In the simplest case, there would be simply one target and one only. This would allow a 
simple TV camera to give 2D point position- essentially be a 2D mouse in space (except 
that absolute position of the point relative to the camera can be determined - the mouse 
of today is incremental from its starting point). 

20 Some applications: 

1 . Direct mouse replacement. The mouse today is in 2D and so is this. Generally 
speaking, depending on where the camera is, this is either the same two 
dimensions, that is looking down at the work space, or the two dimensions are in 

25 another plane. 

2. Indeed one could apply a single target capable of being sensed by the TV camera of 
the invention on the ordinary mouse (or joystick or other input) of today. This could 
give more degrees of freedom of information, such as angles or movement off the 

30 mouse table surface (z direction). For example, a 3D input device can be produced 
since the camera would provide XZ (z perpendicular to plane of surface) and the 
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mouse would provide XY (in plane of surface^ so therefore you would have all three 
dimensions. 

3. Carrying the mouse elaboration one step further, a mouse point could be movable. 
5 That is, the target could be wiggled by the finger holding the mouse, to signal a 

move or other action to the computer. This would then allow you to put inputs to the 
computer into the device without adding any electrical wires or anything. 

4. Transducers can also be used as single point inputs, for example of pressures or 
10 temperatures or anything that would make a target move, for example in the later 

case the target being on the end of a bimetal strip which changes position with 
temperature. 

Application to multiple points and objects 

15 

Another application is to register the relative position of one object to another. For 
example, today the mouse is basically an odometer. It can't really give any positional 
data relative to something but can only give the distance moved in two directions which 
is then converted from some home location onto the screen. 

20 

The invention however is absolute, as the camera is as well. It can provide data on any 
point to any other point or even to groups of points - on objects, humans, or both. 
Even using the simplest form of the invention, one can put a target on a human and 
track it or find it's position in space. Here again, in the beginning in for example in two 
25 dimensions, X and Y only (fig. 1a). 

For example, with a single point one can make mouse adjunct where moving one's 
head with a target on it provides an input into the computer while still holding the mouse 
and everything in normal juxtaposition. 

30 
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One step beyond this is to have more than one point on the human. Clearly a finger 
relative to another finger or a hand relative to another hand, either or both to the head 
and so on. 



5 As has been noted, a method of achieving high contrast and therefore high reliability is 
to utilize an LED source as the target. This is possible with the invention, but requires 
wiring on the object, and thus every object that is to be used has to have a power cable 
or a battery, or a solar cell or other means to actuate the light - a disadvantage if 
widespread applicability is desired. 

10 

The LED in its simplest form can be powered by something that itself is powered. This 
means an LED on top of the mouse for example. On the other hand, typically the LED 
would be on an object where you would not like a power cable and this would then 
mean battery operated. 

15 

The idea of remote power transmission to the target LED or other self luminous target 
however should be noted. It is possible to transmit electromagnetic radiation (radio, IR, 
etc) to a device on an object, which in turn would generate power to an LED which then 
converts that to DC or modulated light capable of detection optically. Or the device itself 
20 can directly make the conversion. 

The basic technical embodiment of the invention illustrated in fig 1 uses a single TV 
camera for viewing a group of 3 or more targets(or special targets able to give up to a 6 
degree of freedom solution), or a set of at least two TV cameras for determining 3D 

25 location of a number of targets individually, and in combination to provide object 

orientation. These cameras are today adapted to the computer by use of the USB port 
or better still, fire wire (IEE 1394). The cameras may be employed to sense natural 
features of objects as targets, but today for cost and speed reasons, are best used with 
high contrast targets such as LED sources on the object, or more generally with retro- 

30 reflective targets. In the latter case lighting as with IR LED's is provided near the optical 
axis of each camera used. For scene illumination, which can be done best on alternate 
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camera frames form target image acquisition, broad light sources can be used. Laser 
pointers are also very useful for creating one or more high contrast indications, 
simultaneously, or in sequence on object surfaces that can be sensed by the stereo 
cameras (typically two or more). 

5 

Using laser (or other triangulation source projection), or the contacting of an object with 
a targeted finger or stylus member, an object can be digitized using the same camera 
system used for target related inputs. This is an important cost justification of total 
system capability. 

10 

Coincidence of action— i.e. sensed gesture using the invention can be used to judge a 
voice operated signal legitimate in a noisy background. Similarly other inputs can be 
judged effectively if combined with the position and movement sensing of the invention. 

15 Invention combined with voice input makes user much more portable- For example can 
walk around room and indicate to the computer both action and words. 

The target if a plain piece of glass bead retroreflector, cannot be seen typically beyond 
angles plus or minus 45 degrees from the normal of the reflector aligned with the 

20 camera viewing axis, (indeed some material drops out at 30 degrees) When a performer 
spins around, this condition is easily exceeded, and the data drops out. For this reason, 
targets pointing in different directions may be desirable. Rather than using several 
planar targets with the above characteristics, each pointed in a different direction say 
rotationally about the head to toe axis of a dancer say, one can use in some cases 

25 multi-directional targets, typically large balls, beads and faceted objects such as 
diamonds. 

In some case only 3D locations are needed. The orientation at times is a secondary 
consideration. In these cases the target 1650 could be attached to gyroscope 1655 that 
30 in turn is attached to a base 1660 by a ball joint 1665 or other free floating mechanical 
link. The target could be initially tilted directly toward the cameras allowing the cameras 
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to view the target more precisely. The base plate is then attached to the object to be 
tracked. The position of the attachment can be calculated once the target location and 
orientation are established. Since the gyroscope would hold the target orientation 
toward the cameras as the dance turns, this method extends the range of motion 
5 allowed by the dancer or other users. 

It should be noted that many of the embodiments of the invention described do not 
depend on TV cameras, Stereo imaging, special targets, or the like, but rather can be 
used with any sort of non contact means by which to determine position of a point, 
10 multiple points, or complete position and orientation of the object, or portion of a human 
used in the embodiment. While optical, and particularly TV camera based systems are 
preferred for their low cost and wide functionality, ultra sonic and microwaves can also 
be used as transduction means in many instances. 

15 

Specialized DEFINITIONS used in the application 
Target Volume 

A "target Volume" is the volume of space (usually a rectangular solid volume) 
20 visible to a video camera or a set of video cameras within which a target will be 
acquired and its position and/or orientation computed. 

Interrupt member 

An "Interrupt member" is a device that senses a signal to the systems computer 
25 allowing a computer program to identify the beginning of one path of a target and the 
end of the preceding path. It can also identify a function, object, or parameter value. 
Examples of an Interrupt member are: 

1 . A given key on the system's keyboard. 

2. A voice recognition system capable of acting on a sound or spoken word. 

30 3. A button attached to a game port, serial port, parallel port, special input card, 

or other input port. 
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4. 



A trigger, switch, dial, etc. that can turn on a light or mechanically make 
visible a new target or sub-target with unique properties of color, shape, and 
size. 



5 Quant 

A "Quant" is a unique discretized or quantized target path (defined by location, 
orientation, and time information) together with the target's unique identification number 
(ID). A Quant has an associated ID (identification number). A Quant is composed of a 
sequence of simple path segments. An example of a Quant that could be used to define 

10 command in a CAD drawing system to create a rectangle might be a target sweep to 
the right punctuated with a short stationary pause followed by an up sweep and pause, 
a left sweep and pause, a down sweep and pause, and finally ended with a key press 
on the keyboard. In this example the Quant is stored as a set (4, 1 , 2, 3, 4, a, 27) where 
4 is the number of path segments, 1-4 are number that identify path segment directions 

15 (i.e. right, up, left, down), "a" is the member interrupt (the key press a), and 27 is the 
target ID. Note that the punctuation that identifies a new path direction could have been 
a radical change in path direction or target orientation or speed. 

Light as used herein includes all electro-magnetic wavelengths from ultraviolet to near 
20 infrared. 
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